Registers a default serializer for the given class and its sub-classes at Kryo.
Registers a default serializer for the given class and its sub-classes at Kryo.
Note that the serializer instance must be serializable (as defined by java.io.Serializable), because it may be distributed to the worker nodes by java serialization.
Registers a default serializer for the given class and its sub-classes at Kryo.
Generic method to create an input DataSet with an org.apache.flink.api.common.io.InputFormat.
Creates the program's org.apache.flink.api.common.Plan.
Creates the program's org.apache.flink.api.common.Plan. The plan is a description of all data sources, data sinks, and operations and how they interact, as an isolated unit that can be executed with a org.apache.flink.api.common.PlanExecutor. Obtaining a plan and starting it with an executor is an alternative way to run a program and is only possible if the program only consists of distributed operations.
Triggers the program execution.
Triggers the program execution. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results DataSet.print, writing results (e.g. DataSet.writeAsText, DataSet.write, or other generic data sinks created with DataSet.output.
The program execution will be logged and displayed with the given name.
The result of the job execution, containing elapsed time and accumulators.
Triggers the program execution.
Triggers the program execution. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results DataSet.print, writing results (e.g. DataSet.writeAsText, DataSet.write, or other generic data sinks created with DataSet.output.
The program execution will be logged and displayed with a generated default name.
The result of the job execution, containing elapsed time and accumulators.
Creates a DataSet from the given Iterator.
Creates a DataSet from the given Iterator.
Note that this operation will result in a non-parallel data source, i.e. a data source with a parallelism of one.
Creates a DataSet from the given non-empty Iterable.
Creates a DataSet from the given non-empty Iterable.
Note that this operation will result in a non-parallel data source, i.e. a data source with a parallelism of one.
Creates a new data set that contains the given elements.
Creates a new data set that contains the given elements.
* Note that this operation will result in a non-parallel data source, i.e. a data source with a parallelism of one.
Creates a new data set that contains elements in the iterator.
Creates a new data set that contains elements in the iterator. The iterator is splittable, allowing the framework to create a parallel data source that returns the elements in the iterator.
Creates a new data set that contains a sequence of numbers.
Creates a new data set that contains a sequence of numbers. The data set will be created in parallel, so there is no guarantee about the oder of the elements.
The number to start at (inclusive).
The number to stop at (inclusive).
Gets the config object.
Creates the plan with which the system will execute the program, and returns it as a String using a JSON representation of the execution data flow graph.
Gets the UUID by which this environment is identified.
Gets the UUID by which this environment is identified. The UUID sets the execution context in the cluster or local environment.
Gets the UUID by which this environment is identified, as a string.
Gets the UUID by which this environment is identified, as a string.
the Java Execution environment.
Gets the JobExecutionResult of the last executed job.
Returns the default parallelism for this execution environment.
Returns the default parallelism for this execution environment. Note that this value can be overridden by individual operations using DataSet.setParallelism
Returns the specified restart strategy configuration.
Returns the specified restart strategy configuration.
The restart strategy configuration to be used
Gets the session timeout for this environment.
Gets the session timeout for this environment. The session timeout defines for how long after an execution, the job and its intermediate results will be kept for future interactions.
The session timeout, in seconds.
Creates a DataSet by reading the given CSV file.
Creates a DataSet by reading the given CSV file. The type parameter must be used to specify
a Tuple type that has the same number of fields as there are fields in the CSV file. If the
number of fields in the CSV file is not the same, the includedFields
parameter can be used
to only read specific fields.
The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").
The string that separates lines, defaults to newline.
The string that separates individual fields, defaults to ",".
The character to use for quoted String parsing, disabled by default.
Whether the first line in the file should be ignored.
Lines that start with the given String are ignored, disabled by default.
Whether the parser should silently ignore malformed lines.
The fields in the file that should be read. Per default all fields are read.
The fields of the POJO which are mapped to CSV fields.
Creates a new DataSource by reading the specified file using the custom org.apache.flink.api.common.io.FileInputFormat.
Creates a DataSet that represents the primitive type produced by reading the given file in delimited way.This method is similar to readCsvFile with single field, but it produces a DataSet not through Tuple.
Creates a DataSet that represents the primitive type produced by reading the given file in delimited way.This method is similar to readCsvFile with single field, but it produces a DataSet not through Tuple. The type parameter must be used to specify the primitive type.
The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").
The string that separates primitives , defaults to newline.
Creates a DataSet of Strings produced by reading the given file line wise.
Creates a DataSet of Strings produced by reading the given file line wise.
The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").
The name of the character set used to read the file. Default is UTF-0
Creates a DataSet of Strings produced by reading the given file line wise.
Creates a DataSet of Strings produced by reading the given file line wise. This method is similar to readTextFile, but it produces a DataSet with mutable StringValue objects, rather than Java Strings. StringValues can be used to tune implementations to be less object and garbage collection heavy.
The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").
The name of the character set used to read the file. Default is UTF-0
Registers a file at the distributed cache under the given name.
Registers a file at the distributed cache under the given name. The file will be accessible from any user-defined function in the (distributed) runtime under a local path. Files may be local files (as long as all relevant workers have access to it), or files in a distributed file system. The runtime will copy the files temporarily to a local cache, if needed.
The org.apache.flink.api.common.functions.RuntimeContext can be obtained inside UDFs via org.apache.flink.api.common.functions.RichFunction#getRuntimeContext and provides access via org.apache.flink.api.common.functions.RuntimeContext#getDistributedCache
The path of the file, as a URI (e.g. "file:///some/path" or "hdfs://host:port/and/path")
The name under which the file is registered.
Flag indicating whether the file should be executable
Registers the given type with the serialization stack.
Registers the given type with the serialization stack. If the type is eventually serialized as a POJO, then the type is registered with the POJO serializer. If the type ends up being serialized with Kryo, then it will be registered at Kryo to make sure that only tags are written.
Registers the given type with the serializer at the KryoSerializer.
Registers the given type with the serializer at the KryoSerializer.
Registers the given type with the serializer at the KryoSerializer.
Note that the serializer instance must be serializable (as defined by java.io.Serializable), because it may be distributed to the worker nodes by java serialization.
Sets the parallelism (parallelism) for operations executed through this environment.
Sets the parallelism (parallelism) for operations executed through this environment. Setting a parallelism of x here will cause all operators (such as join, map, reduce) to run with x parallel instances. This value can be overridden by specific operations using DataSet.setParallelism.
Sets the restart strategy configuration.
Sets the restart strategy configuration. The configuration specifies which restart strategy will be used for the execution graph in case of a restart.
Restart strategy configuration to be set
Sets the session timeout to hold the intermediate results of a job.
Sets the session timeout to hold the intermediate results of a job. This only applies the updated timeout in future executions.
The timeout in seconds.
Starts a new session, discarding all intermediate results.
Starts a new session, discarding all intermediate results.
Gets the number of times the system will try to re-execute failed tasks.
Gets the number of times the system will try to re-execute failed tasks. A value of "-1" indicates that the system default value (as defined in the configuration) should be used.
This method will be replaced by getRestartStrategy. The FixedDelayRestartStrategyConfiguration contains the number of execution retries.
Sets the number of times that failed tasks are re-executed.
Sets the number of times that failed tasks are re-executed. A value of zero effectively disables fault tolerance. A value of "-1" indicates that the system default value (as defined in the configuration) should be used.
This method will be replaced by setRestartStrategy(). The FixedDelayRestartStrategyConfiguration contains the number of execution retries.
The ExecutionEnvironment is the context in which a program is executed. A local environment will cause execution in the current JVM, a remote environment will cause execution on a remote cluster installation.
The environment provides methods to control the job execution (such as setting the parallelism) and to interact with the outside world (data access).
To get an execution environment use the methods on the companion object:
Use ExecutionEnvironment#getExecutionEnvironment to get the correct environment depending on where the program is executed. If it is run inside an IDE a local environment will be created. If the program is submitted to a cluster a remote execution environment will be created.