This documentation is for an out-of-date version of Apache Flink. We recommend you use the latest stable version.

Cluster Execution

Flink programs can run distributed on clusters of many machines. There are two ways to send a program to a cluster for execution:

Command Line Interface

The command line interface lets you submit packaged programs (JARs) to a cluster (or single machine setup).

Please refer to the Command Line Interface documentation for details.

Remote Environment

The remote environment lets you execute Flink Java programs on a cluster directly. The remote environment points to the cluster on which you want to execute the program.

Maven Dependency

If you are developing your program as a Maven project, you have to add the flink-clients module using this dependency:

<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-clients_2.11</artifactId>
  <version>1.11.6</version>
</dependency>

Example

The following illustrates the use of the RemoteEnvironment:

public static void main(String[] args) throws Exception {
    ExecutionEnvironment env = ExecutionEnvironment
        .createRemoteEnvironment("flink-jobmanager", 8081, "/home/user/udfs.jar");

    DataSet<String> data = env.readTextFile("hdfs://path/to/file");

    data
        .filter(new FilterFunction<String>() {
            public boolean filter(String value) {
                return value.startsWith("http://");
            }
        })
        .writeAsText("hdfs://path/to/result");

    env.execute();
}

Note that the program contains custom user code and hence requires a JAR file with the classes of the code attached. The constructor of the remote environment takes the path(s) to the JAR file(s).

Back to top