Amazon Web Services (AWS)

Amazon Web Services offers cloud computing services on which you can run Flink.

EMR: Elastic MapReduce

Amazon Elastic MapReduce (Amazon EMR) is a web service that makes it easy to quickly setup a Hadoop cluster. This is the recommended way to run Flink on AWS as it takes care of setting up everything.

Standard EMR Installation

Flink is a supported application on Amazon EMR. Amazon’s documentation describes configuring Flink, creating and monitoring a cluster, and working with jobs.

Custom EMR Installation

Amazon EMR services are regularly updated to new releases but a version of Flink which is not available can be manually installed in a stock EMR cluster.

Create EMR Cluster

The EMR documentation contains examples showing how to start an EMR cluster. You can follow that guide and install any EMR release. You don’t need to install the All Applications part of the EMR release, but can stick to Core Hadoop.

Note Access to S3 buckets requires configuration of IAM roles when creating an EMR cluster.

Install Flink on EMR Cluster

After creating your cluster, you can connect to the master node and install Flink:

  1. Go the Downloads Page and download a binary version of Flink matching the Hadoop version of your EMR cluster, e.g. Hadoop 2.7 for EMR releases 4.3.0, 4.4.0, or 4.5.0.
  2. Make sure all the Hadoop dependencies are in the classpath before you submit any jobs to EMR:
export HADOOP_CLASSPATH=`hadoop classpath`
  1. Extract the Flink distribution and you are ready to deploy Flink jobs via YARN after setting the Hadoop config directory:
HADOOP_CONF_DIR=/etc/hadoop/conf ./bin/flink run -m yarn-cluster examples/streaming/WordCount.jar

Back to top