org.apache.flink.api.scala

ExecutionEnvironment

class ExecutionEnvironment extends AnyRef

The ExecutionEnvironment is the context in which a program is executed. A local environment will cause execution in the current JVM, a remote environment will cause execution on a remote cluster installation.

The environment provides methods to control the job execution (such as setting the parallelism) and to interact with the outside world (data access).

To get an execution environment use the methods on the companion object:

Use ExecutionEnvironment#getExecutionEnvironment to get the correct environment depending on where the program is executed. If it is run inside an IDE a local environment will be created. If the program is submitted to a cluster a remote execution environment will be created.

Annotations
@Public()
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. ExecutionEnvironment
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new ExecutionEnvironment(javaEnv: java.ExecutionEnvironment)

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. def addDefaultKryoSerializer[T <: Serializer[_] with Serializable](clazz: Class[_], serializer: T): Unit

    Registers a default serializer for the given class and its sub-classes at Kryo.

    Registers a default serializer for the given class and its sub-classes at Kryo.

    Note that the serializer instance must be serializable (as defined by java.io.Serializable), because it may be distributed to the worker nodes by java serialization.

  7. def addDefaultKryoSerializer(clazz: Class[_], serializer: Class[_ <: Serializer[_]]): Unit

    Registers a default serializer for the given class and its sub-classes at Kryo.

  8. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  9. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  10. def createInput[T](inputFormat: InputFormat[T, _])(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

    Generic method to create an input DataSet with an org.apache.flink.api.common.io.InputFormat.

  11. def createProgramPlan(jobName: String = ""): Plan

    Creates the program's org.apache.flink.api.common.Plan.

    Creates the program's org.apache.flink.api.common.Plan. The plan is a description of all data sources, data sinks, and operations and how they interact, as an isolated unit that can be executed with a org.apache.flink.api.common.PlanExecutor. Obtaining a plan and starting it with an executor is an alternative way to run a program and is only possible if the program only consists of distributed operations.

  12. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  13. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  14. def execute(jobName: String): JobExecutionResult

    Triggers the program execution.

    Triggers the program execution. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results DataSet.print, writing results (e.g. DataSet.writeAsText, DataSet.write, or other generic data sinks created with DataSet.output.

    The program execution will be logged and displayed with the given name.

    returns

    The result of the job execution, containing elapsed time and accumulators.

  15. def execute(): JobExecutionResult

    Triggers the program execution.

    Triggers the program execution. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results DataSet.print, writing results (e.g. DataSet.writeAsText, DataSet.write, or other generic data sinks created with DataSet.output.

    The program execution will be logged and displayed with a generated default name.

    returns

    The result of the job execution, containing elapsed time and accumulators.

  16. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  17. def fromCollection[T](data: Iterator[T])(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

    Creates a DataSet from the given Iterator.

    Creates a DataSet from the given Iterator.

    Note that this operation will result in a non-parallel data source, i.e. a data source with a parallelism of one.

  18. def fromCollection[T](data: Iterable[T])(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

    Creates a DataSet from the given non-empty Iterable.

    Creates a DataSet from the given non-empty Iterable.

    Note that this operation will result in a non-parallel data source, i.e. a data source with a parallelism of one.

  19. def fromElements[T](data: T*)(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

    Creates a new data set that contains the given elements.

    Creates a new data set that contains the given elements.

    * Note that this operation will result in a non-parallel data source, i.e. a data source with a parallelism of one.

  20. def fromParallelCollection[T](iterator: SplittableIterator[T])(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

    Creates a new data set that contains elements in the iterator.

    Creates a new data set that contains elements in the iterator. The iterator is splittable, allowing the framework to create a parallel data source that returns the elements in the iterator.

  21. def generateSequence(from: Long, to: Long): DataSet[Long]

    Creates a new data set that contains a sequence of numbers.

    Creates a new data set that contains a sequence of numbers. The data set will be created in parallel, so there is no guarantee about the oder of the elements.

    from

    The number to start at (inclusive).

    to

    The number to stop at (inclusive).

  22. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  23. def getConfig: ExecutionConfig

    Gets the config object.

  24. def getExecutionPlan(): String

    Creates the plan with which the system will execute the program, and returns it as a String using a JSON representation of the execution data flow graph.

  25. def getId: JobID

    Gets the UUID by which this environment is identified.

    Gets the UUID by which this environment is identified. The UUID sets the execution context in the cluster or local environment.

    Annotations
    @PublicEvolving()
  26. def getIdString: String

    Gets the UUID by which this environment is identified, as a string.

    Gets the UUID by which this environment is identified, as a string.

    Annotations
    @PublicEvolving()
  27. def getJavaEnv: java.ExecutionEnvironment

    returns

    the Java Execution environment.

  28. def getLastJobExecutionResult: JobExecutionResult

    Gets the JobExecutionResult of the last executed job.

  29. def getParallelism: Int

    Returns the default parallelism for this execution environment.

    Returns the default parallelism for this execution environment. Note that this value can be overridden by individual operations using DataSet.setParallelism

  30. def getRestartStrategy: RestartStrategyConfiguration

    Returns the specified restart strategy configuration.

    Returns the specified restart strategy configuration.

    returns

    The restart strategy configuration to be used

    Annotations
    @PublicEvolving()
  31. def getSessionTimeout: Long

    Gets the session timeout for this environment.

    Gets the session timeout for this environment. The session timeout defines for how long after an execution, the job and its intermediate results will be kept for future interactions.

    returns

    The session timeout, in seconds.

    Annotations
    @PublicEvolving()
  32. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  33. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  34. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  35. final def notify(): Unit

    Definition Classes
    AnyRef
  36. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  37. def readCsvFile[T](filePath: String, lineDelimiter: String = "\n", fieldDelimiter: String = ",", quoteCharacter: Character = null, ignoreFirstLine: Boolean = false, ignoreComments: String = null, lenient: Boolean = false, includedFields: Array[Int] = null, pojoFields: Array[String] = null)(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

    Creates a DataSet by reading the given CSV file.

    Creates a DataSet by reading the given CSV file. The type parameter must be used to specify a Tuple type that has the same number of fields as there are fields in the CSV file. If the number of fields in the CSV file is not the same, the includedFields parameter can be used to only read specific fields.

    filePath

    The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").

    lineDelimiter

    The string that separates lines, defaults to newline.

    fieldDelimiter

    The string that separates individual fields, defaults to ",".

    quoteCharacter

    The character to use for quoted String parsing, disabled by default.

    ignoreFirstLine

    Whether the first line in the file should be ignored.

    ignoreComments

    Lines that start with the given String are ignored, disabled by default.

    lenient

    Whether the parser should silently ignore malformed lines.

    includedFields

    The fields in the file that should be read. Per default all fields are read.

    pojoFields

    The fields of the POJO which are mapped to CSV fields.

  38. def readFile[T](inputFormat: FileInputFormat[T], filePath: String)(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

    Creates a new DataSource by reading the specified file using the custom org.apache.flink.api.common.io.FileInputFormat.

  39. def readFileOfPrimitives[T](filePath: String, delimiter: String = "\n")(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

    Creates a DataSet that represents the primitive type produced by reading the given file in delimited way.

    Creates a DataSet that represents the primitive type produced by reading the given file in delimited way.This method is similar to readCsvFile with single field, but it produces a DataSet not through Tuple. The type parameter must be used to specify the primitive type.

    filePath

    The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").

    delimiter

    The string that separates primitives , defaults to newline.

  40. def readTextFile(filePath: String, charsetName: String = "UTF-8"): DataSet[String]

    Creates a DataSet of Strings produced by reading the given file line wise.

    Creates a DataSet of Strings produced by reading the given file line wise.

    filePath

    The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").

    charsetName

    The name of the character set used to read the file. Default is UTF-0

  41. def readTextFileWithValue(filePath: String, charsetName: String = "UTF-8"): DataSet[StringValue]

    Creates a DataSet of Strings produced by reading the given file line wise.

    Creates a DataSet of Strings produced by reading the given file line wise. This method is similar to readTextFile, but it produces a DataSet with mutable StringValue objects, rather than Java Strings. StringValues can be used to tune implementations to be less object and garbage collection heavy.

    filePath

    The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").

    charsetName

    The name of the character set used to read the file. Default is UTF-0

  42. def registerCachedFile(filePath: String, name: String, executable: Boolean = false): Unit

    Registers a file at the distributed cache under the given name.

    Registers a file at the distributed cache under the given name. The file will be accessible from any user-defined function in the (distributed) runtime under a local path. Files may be local files (as long as all relevant workers have access to it), or files in a distributed file system. The runtime will copy the files temporarily to a local cache, if needed.

    The org.apache.flink.api.common.functions.RuntimeContext can be obtained inside UDFs via org.apache.flink.api.common.functions.RichFunction#getRuntimeContext and provides access via org.apache.flink.api.common.functions.RuntimeContext#getDistributedCache

    filePath

    The path of the file, as a URI (e.g. "file:///some/path" or "hdfs://host:port/and/path")

    name

    The name under which the file is registered.

    executable

    Flag indicating whether the file should be executable

  43. def registerType(typeClass: Class[_]): Unit

    Registers the given type with the serialization stack.

    Registers the given type with the serialization stack. If the type is eventually serialized as a POJO, then the type is registered with the POJO serializer. If the type ends up being serialized with Kryo, then it will be registered at Kryo to make sure that only tags are written.

  44. def registerTypeWithKryoSerializer(clazz: Class[_], serializer: Class[_ <: Serializer[_]]): Unit

    Registers the given type with the serializer at the KryoSerializer.

  45. def registerTypeWithKryoSerializer[T <: Serializer[_] with Serializable](clazz: Class[_], serializer: T): Unit

    Registers the given type with the serializer at the KryoSerializer.

    Registers the given type with the serializer at the KryoSerializer.

    Note that the serializer instance must be serializable (as defined by java.io.Serializable), because it may be distributed to the worker nodes by java serialization.

  46. def setParallelism(parallelism: Int): Unit

    Sets the parallelism (parallelism) for operations executed through this environment.

    Sets the parallelism (parallelism) for operations executed through this environment. Setting a parallelism of x here will cause all operators (such as join, map, reduce) to run with x parallel instances. This value can be overridden by specific operations using DataSet.setParallelism.

  47. def setRestartStrategy(restartStrategyConfiguration: RestartStrategyConfiguration): Unit

    Sets the restart strategy configuration.

    Sets the restart strategy configuration. The configuration specifies which restart strategy will be used for the execution graph in case of a restart.

    restartStrategyConfiguration

    Restart strategy configuration to be set

    Annotations
    @PublicEvolving()
  48. def setSessionTimeout(timeout: Long): Unit

    Sets the session timeout to hold the intermediate results of a job.

    Sets the session timeout to hold the intermediate results of a job. This only applies the updated timeout in future executions.

    timeout

    The timeout in seconds.

    Annotations
    @PublicEvolving()
  49. def startNewSession(): Unit

    Starts a new session, discarding all intermediate results.

    Starts a new session, discarding all intermediate results.

    Annotations
    @PublicEvolving()
  50. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  51. def toString(): String

    Definition Classes
    AnyRef → Any
  52. def union[T](sets: Seq[DataSet[T]]): DataSet[T]

  53. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  54. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  55. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Deprecated Value Members

  1. def createHadoopInput[K, V](mapreduceInputFormat: InputFormat[K, V], key: Class[K], value: Class[V], job: Job)(implicit tpe: TypeInformation[(K, V)]): DataSet[(K, V)]

    Creates a DataSet from the given org.apache.hadoop.mapreduce.InputFormat.

    Creates a DataSet from the given org.apache.hadoop.mapreduce.InputFormat.

    Annotations
    @Deprecated @PublicEvolving()
    Deprecated

    Please use org.apache.flink.hadoopcompatibility.scala.HadoopInputs#createHadoopInput from the flink-hadoop-compatibility module.

  2. def createHadoopInput[K, V](mapredInputFormat: InputFormat[K, V], key: Class[K], value: Class[V], job: JobConf)(implicit tpe: TypeInformation[(K, V)]): DataSet[(K, V)]

    Creates a DataSet from the given org.apache.hadoop.mapred.InputFormat.

    Creates a DataSet from the given org.apache.hadoop.mapred.InputFormat.

    Annotations
    @Deprecated @PublicEvolving()
    Deprecated

    Please use org.apache.flink.hadoopcompatibility.scala.HadoopInputs#createHadoopInput from the flink-hadoop-compatibility module.

  3. def getNumberOfExecutionRetries: Int

    Gets the number of times the system will try to re-execute failed tasks.

    Gets the number of times the system will try to re-execute failed tasks. A value of "-1" indicates that the system default value (as defined in the configuration) should be used.

    Annotations
    @Deprecated @PublicEvolving()
    Deprecated

    This method will be replaced by getRestartStrategy. The FixedDelayRestartStrategyConfiguration contains the number of execution retries.

  4. def readHadoopFile[K, V](mapreduceInputFormat: FileInputFormat[K, V], key: Class[K], value: Class[V], inputPath: String)(implicit tpe: TypeInformation[(K, V)]): DataSet[(K, V)]

    Creates a DataSet from the given org.apache.hadoop.mapreduce.lib.input.FileInputFormat.

    Creates a DataSet from the given org.apache.hadoop.mapreduce.lib.input.FileInputFormat. A org.apache.hadoop.mapreduce.Job with the given inputPath will be created.

    Annotations
    @Deprecated @PublicEvolving()
    Deprecated

    Please use org.apache.flink.hadoopcompatibility.scala.HadoopInputs#readHadoopFile from the flink-hadoop-compatibility module.

  5. def readHadoopFile[K, V](mapreduceInputFormat: FileInputFormat[K, V], key: Class[K], value: Class[V], inputPath: String, job: Job)(implicit tpe: TypeInformation[(K, V)]): DataSet[(K, V)]

    Creates a DataSet from the given org.apache.hadoop.mapreduce.lib.input.FileInputFormat.

    Creates a DataSet from the given org.apache.hadoop.mapreduce.lib.input.FileInputFormat. The given inputName is set on the given job.

    Annotations
    @Deprecated @PublicEvolving()
    Deprecated

    Please use org.apache.flink.hadoopcompatibility.scala.HadoopInputs#readHadoopFile from the flink-hadoop-compatibility module.

  6. def readHadoopFile[K, V](mapredInputFormat: FileInputFormat[K, V], key: Class[K], value: Class[V], inputPath: String)(implicit tpe: TypeInformation[(K, V)]): DataSet[(K, V)]

    Creates a DataSet from the given org.apache.hadoop.mapred.FileInputFormat.

    Creates a DataSet from the given org.apache.hadoop.mapred.FileInputFormat. A org.apache.hadoop.mapred.JobConf with the given inputPath is created.

    Annotations
    @Deprecated @PublicEvolving()
    Deprecated

    Please use org.apache.flink.hadoopcompatibility.scala.HadoopInputs#readHadoopFile from the flink-hadoop-compatibility module.

  7. def readHadoopFile[K, V](mapredInputFormat: FileInputFormat[K, V], key: Class[K], value: Class[V], inputPath: String, job: JobConf)(implicit tpe: TypeInformation[(K, V)]): DataSet[(K, V)]

    Creates a DataSet from the given org.apache.hadoop.mapred.FileInputFormat.

    Creates a DataSet from the given org.apache.hadoop.mapred.FileInputFormat. The given inputName is set on the given job.

    Annotations
    @Deprecated @PublicEvolving()
    Deprecated

    Please use org.apache.flink.hadoopcompatibility.scala.HadoopInputs#readHadoopFile from the flink-hadoop-compatibility module.

  8. def readSequenceFile[K, V](key: Class[K], value: Class[V], inputPath: String)(implicit tpe: TypeInformation[(K, V)]): DataSet[(K, V)]

    Creates a DataSet from org.apache.hadoop.mapred.SequenceFileInputFormat A org.apache.hadoop.mapred.JobConf with the given inputPath is created.

    Creates a DataSet from org.apache.hadoop.mapred.SequenceFileInputFormat A org.apache.hadoop.mapred.JobConf with the given inputPath is created.

    Annotations
    @Deprecated @PublicEvolving()
    Deprecated

    Please use org.apache.flink.hadoopcompatibility.scala.HadoopInputs#readSequenceFile from the flink-hadoop-compatibility module.

  9. def setNumberOfExecutionRetries(numRetries: Int): Unit

    Sets the number of times that failed tasks are re-executed.

    Sets the number of times that failed tasks are re-executed. A value of zero effectively disables fault tolerance. A value of "-1" indicates that the system default value (as defined in the configuration) should be used.

    Annotations
    @Deprecated @PublicEvolving()
    Deprecated

    This method will be replaced by setRestartStrategy(). The FixedDelayRestartStrategyConfiguration contains the number of execution retries.

Inherited from AnyRef

Inherited from Any

Ungrouped