MapR Setup

This documentation is for an out-of-date version of Apache Flink. We recommend you use the latest stable version.

This documentation provides instructions on how to prepare Flink for YARN executions on a MapR cluster.

Running Flink on YARN with MapR
- Building Flink for MapR
- Job Submission Client Setup
Running Flink with a Secured MapR Cluster

Running Flink on YARN with MapR

The instructions below assume MapR version 5.2.0. They will guide you to be able to start submitting Flink on YARN jobs or sessions to a MapR cluster.

Building Flink for MapR

In order to run Flink on MapR, Flink needs to be built with MapR’s own Hadoop and Zookeeper distribution. Simply build Flink using Maven with the following command from the project root directory:

mvn clean install -DskipTests -Pvendor-repos,mapr \
    -Dhadoop.version=2.7.0-mapr-1607 \
    -Dzookeeper.version=3.4.5-mapr-1604

The vendor-repos build profile adds MapR’s repository to the build so that MapR’s Hadoop / Zookeeper dependencies can be fetched. The mapr build profile additionally resolves some dependency clashes between MapR and Flink, as well as ensuring that the native MapR libraries on the cluster nodes are used. Both profiles must be activated.

By default the mapr profile builds with Hadoop / Zookeeper dependencies for MapR version 5.2.0, so you don’t need to explicitly override the hadoop.version and zookeeper.version properties. For different MapR versions, simply override these properties to appropriate values. The corresponding Hadoop / Zookeeper distributions for each MapR version can be found on MapR documentations such as here.

Job Submission Client Setup

The client submitting Flink jobs to MapR also needs to be prepared with the below setups.

Ensure that MapR’s JAAS config file is picked up to avoid login failures:

export JVM_ARGS=-Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf

Make sure that the yarn.nodemanager.resource.cpu-vcores property is set in yarn-site.xml:

<!-- in /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/yarn-site.xml -->

<configuration>
...

<property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>...</value>
</property>

...
</configuration>

Also remember to set the YARN_CONF_DIR or HADOOP_CONF_DIR environment variables to the path where yarn-site.xml is located:

export YARN_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/
export HADOOP_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/

Make sure that the MapR native libraries are picked up in the classpath:

export FLINK_CLASSPATH=/opt/mapr/lib/*

If you’ll be starting Flink on YARN sessions with yarn-session.sh, the below is also required:

export CC_CLASSPATH=/opt/mapr/lib/*

Running Flink with a Secured MapR Cluster

Note: In Flink 1.2.0, Flink’s Kerberos authentication for YARN execution has a bug that forbids it to work with MapR Security. Please upgrade to later Flink versions in order to use Flink with a secured MapR cluster. For more details, please see FLINK-5949.

Flink’s Kerberos authentication is independent of MapR’s Security authentication. With the above build procedures and environment variable setups, Flink does not require any additional configuration to work with MapR Security.

Users simply need to login by using MapR’s maprlogin authentication utility. Users that haven’t acquired MapR login credentials would not be able to submit Flink jobs, erroring with:

java.lang.Exception: unable to establish the security context
Caused by: o.a.f.r.security.modules.SecurityModule$SecurityInstallException: Unable to set the Hadoop login user
Caused by: java.io.IOException: failure to login: Unable to obtain MapR credentials