Class/Object

org.apache.flink.api.scala

ExecutionEnvironment

Related Docs: object ExecutionEnvironment | package scala

Permalink

class ExecutionEnvironment extends AnyRef

The ExecutionEnvironment is the context in which a program is executed. A local environment will cause execution in the current JVM, a remote environment will cause execution on a remote cluster installation.

The environment provides methods to control the job execution (such as setting the parallelism) and to interact with the outside world (data access).

To get an execution environment use the methods on the companion object:

Use ExecutionEnvironment#getExecutionEnvironment to get the correct environment depending on where the program is executed. If it is run inside an IDE a local environment will be created. If the program is submitted to a cluster a remote execution environment will be created.

Annotations
@Public()
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. ExecutionEnvironment
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new ExecutionEnvironment(javaEnv: java.ExecutionEnvironment)

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def addDefaultKryoSerializer[T <: Serializer[_] with Serializable](clazz: Class[_], serializer: T): Unit

    Permalink

    Registers a default serializer for the given class and its sub-classes at Kryo.

    Registers a default serializer for the given class and its sub-classes at Kryo.

    Note that the serializer instance must be serializable (as defined by java.io.Serializable), because it may be distributed to the worker nodes by java serialization.

  5. def addDefaultKryoSerializer(clazz: Class[_], serializer: Class[_ <: Serializer[_]]): Unit

    Permalink

    Registers a default serializer for the given class and its sub-classes at Kryo.

  6. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  7. def clearJobListeners(): Unit

    Permalink

    Clear all registered JobListeners.

    Clear all registered JobListeners.

    Annotations
    @PublicEvolving()
  8. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. def configure(configuration: ReadableConfig, classLoader: ClassLoader): Unit

    Permalink

    Sets all relevant options contained in the ReadableConfig such as e.g.

    Sets all relevant options contained in the ReadableConfig such as e.g. org.apache.flink.configuration.PipelineOptions#CACHED_FILES. It will reconfigure ExecutionEnvironment and ExecutionConfig.

    It will change the value of a setting only if a corresponding option was set in the configuration. If a key is not present, the current value of a field will remain untouched.

    configuration

    a configuration to read the values from

    classLoader

    a class loader to use when loading classes

    Annotations
    @PublicEvolving()
  10. def createInput[T](inputFormat: InputFormat[T, _])(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

    Permalink

    Generic method to create an input DataSet with an org.apache.flink.api.common.io.InputFormat.

  11. def createProgramPlan(jobName: String = ""): Plan

    Permalink

    Creates the program's org.apache.flink.api.common.Plan.

    Creates the program's org.apache.flink.api.common.Plan. The plan is a description of all data sources, data sinks, and operations and how they interact, as an isolated unit that can be executed with an PipelineExecutor. Obtaining a plan and starting it with an executor is an alternative way to run a program and is only possible if the program only consists of distributed operations.

  12. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  13. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  14. def execute(jobName: String): JobExecutionResult

    Permalink

    Triggers the program execution.

    Triggers the program execution. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results DataSet.print, writing results (e.g. DataSet.writeAsText, DataSet.write, or other generic data sinks created with DataSet.output.

    The program execution will be logged and displayed with the given name.

    returns

    The result of the job execution, containing elapsed time and accumulators.

  15. def execute(): JobExecutionResult

    Permalink

    Triggers the program execution.

    Triggers the program execution. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results DataSet.print, writing results (e.g. DataSet.writeAsText, DataSet.write, or other generic data sinks created with DataSet.output.

    The program execution will be logged and displayed with a generated default name.

    returns

    The result of the job execution, containing elapsed time and accumulators.

  16. def executeAsync(jobName: String): JobClient

    Permalink

    Triggers the program execution asynchronously.

    Triggers the program execution asynchronously. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results DataSet.print, writing results (e.g. DataSet.writeAsText, DataSet.write, or other generic data sinks created with DataSet.output.

    The program execution will be logged and displayed with the given name.

    ATTENTION: The caller of this method is responsible for managing the lifecycle of the returned JobClient. This means calling JobClient#close() at the end of its usage. In other case, there may be resource leaks depending on the JobClient implementation.

    returns

    A JobClient that can be used to communicate with the submitted job, completed on submission succeeded.

    Annotations
    @PublicEvolving()
  17. def executeAsync(): JobClient

    Permalink

    Triggers the program execution asynchronously.

    Triggers the program execution asynchronously. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results DataSet.print, writing results (e.g. DataSet.writeAsText, DataSet.write, or other generic data sinks created with DataSet.output.

    The program execution will be logged and displayed with a generated default name.

    ATTENTION: The caller of this method is responsible for managing the lifecycle of the returned JobClient. This means calling JobClient#close() at the end of its usage. In other case, there may be resource leaks depending on the JobClient implementation.

    returns

    A JobClient that can be used to communicate with the submitted job, completed on submission succeeded.

    Annotations
    @PublicEvolving()
  18. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  19. def fromCollection[T](data: Iterator[T])(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

    Permalink

    Creates a DataSet from the given Iterator.

    Creates a DataSet from the given Iterator.

    Note that this operation will result in a non-parallel data source, i.e. a data source with a parallelism of one.

  20. def fromCollection[T](data: Iterable[T])(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

    Permalink

    Creates a DataSet from the given non-empty Iterable.

    Creates a DataSet from the given non-empty Iterable.

    Note that this operation will result in a non-parallel data source, i.e. a data source with a parallelism of one.

  21. def fromElements[T](data: T*)(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

    Permalink

    Creates a new data set that contains the given elements.

    Creates a new data set that contains the given elements.

    * Note that this operation will result in a non-parallel data source, i.e. a data source with a parallelism of one.

  22. def fromParallelCollection[T](iterator: SplittableIterator[T])(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

    Permalink

    Creates a new data set that contains elements in the iterator.

    Creates a new data set that contains elements in the iterator. The iterator is splittable, allowing the framework to create a parallel data source that returns the elements in the iterator.

  23. def generateSequence(from: Long, to: Long): DataSet[Long]

    Permalink

    Creates a new data set that contains a sequence of numbers.

    Creates a new data set that contains a sequence of numbers. The data set will be created in parallel, so there is no guarantee about the oder of the elements.

    from

    The number to start at (inclusive).

    to

    The number to stop at (inclusive).

  24. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  25. def getConfig: ExecutionConfig

    Permalink

    Gets the config object.

  26. def getExecutionPlan(): String

    Permalink

    Creates the plan with which the system will execute the program, and returns it as a String using a JSON representation of the execution data flow graph.

  27. def getJavaEnv: java.ExecutionEnvironment

    Permalink

    returns

    the Java Execution environment.

  28. def getLastJobExecutionResult: JobExecutionResult

    Permalink

    Gets the JobExecutionResult of the last executed job.

  29. def getParallelism: Int

    Permalink

    Returns the default parallelism for this execution environment.

    Returns the default parallelism for this execution environment. Note that this value can be overridden by individual operations using DataSet.setParallelism

  30. def getRestartStrategy: RestartStrategyConfiguration

    Permalink

    Returns the specified restart strategy configuration.

    Returns the specified restart strategy configuration.

    returns

    The restart strategy configuration to be used

    Annotations
    @PublicEvolving()
  31. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  32. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  33. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  34. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  35. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  36. def readCsvFile[T](filePath: String, lineDelimiter: String = "\n", fieldDelimiter: String = ",", quoteCharacter: Character = null, ignoreFirstLine: Boolean = false, ignoreComments: String = null, lenient: Boolean = false, includedFields: Array[Int] = null, pojoFields: Array[String] = null)(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

    Permalink

    Creates a DataSet by reading the given CSV file.

    Creates a DataSet by reading the given CSV file. The type parameter must be used to specify a Tuple type that has the same number of fields as there are fields in the CSV file. If the number of fields in the CSV file is not the same, the includedFields parameter can be used to only read specific fields.

    filePath

    The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").

    lineDelimiter

    The string that separates lines, defaults to newline.

    fieldDelimiter

    The string that separates individual fields, defaults to ",".

    quoteCharacter

    The character to use for quoted String parsing, disabled by default.

    ignoreFirstLine

    Whether the first line in the file should be ignored.

    ignoreComments

    Lines that start with the given String are ignored, disabled by default.

    lenient

    Whether the parser should silently ignore malformed lines.

    includedFields

    The fields in the file that should be read. Per default all fields are read.

    pojoFields

    The fields of the POJO which are mapped to CSV fields.

  37. def readFile[T](inputFormat: FileInputFormat[T], filePath: String)(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

    Permalink

    Creates a new DataSource by reading the specified file using the custom org.apache.flink.api.common.io.FileInputFormat.

  38. def readFileOfPrimitives[T](filePath: String, delimiter: String = "\n")(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

    Permalink

    Creates a DataSet that represents the primitive type produced by reading the given file in delimited way.This method is similar to readCsvFile with single field, but it produces a DataSet not through Tuple.

    Creates a DataSet that represents the primitive type produced by reading the given file in delimited way.This method is similar to readCsvFile with single field, but it produces a DataSet not through Tuple. The type parameter must be used to specify the primitive type.

    filePath

    The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").

    delimiter

    The string that separates primitives , defaults to newline.

  39. def readTextFile(filePath: String, charsetName: String = "UTF-8"): DataSet[String]

    Permalink

    Creates a DataSet of Strings produced by reading the given file line wise.

    Creates a DataSet of Strings produced by reading the given file line wise.

    filePath

    The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").

    charsetName

    The name of the character set used to read the file. Default is UTF-8

  40. def readTextFileWithValue(filePath: String, charsetName: String = "UTF-8"): DataSet[StringValue]

    Permalink

    Creates a DataSet of Strings produced by reading the given file line wise.

    Creates a DataSet of Strings produced by reading the given file line wise. This method is similar to readTextFile, but it produces a DataSet with mutable StringValue objects, rather than Java Strings. StringValues can be used to tune implementations to be less object and garbage collection heavy.

    filePath

    The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").

    charsetName

    The name of the character set used to read the file. Default is UTF-8

  41. def registerCachedFile(filePath: String, name: String, executable: Boolean = false): Unit

    Permalink

    Registers a file at the distributed cache under the given name.

    Registers a file at the distributed cache under the given name. The file will be accessible from any user-defined function in the (distributed) runtime under a local path. Files may be local files (which will be distributed via BlobServer), or files in a distributed file system. The runtime will copy the files temporarily to a local cache, if needed.

    The org.apache.flink.api.common.functions.RuntimeContext can be obtained inside UDFs via org.apache.flink.api.common.functions.RichFunction#getRuntimeContext and provides access via org.apache.flink.api.common.functions.RuntimeContext#getDistributedCache

    filePath

    The path of the file, as a URI (e.g. "file:///some/path" or "hdfs://host:port/and/path")

    name

    The name under which the file is registered.

    executable

    Flag indicating whether the file should be executable

  42. def registerJobListener(jobListener: JobListener): Unit

    Permalink

    Register a JobListener in this environment.

    Register a JobListener in this environment. The JobListener will be notified on specific job status changed.

    Annotations
    @PublicEvolving()
  43. def registerType(typeClass: Class[_]): Unit

    Permalink

    Registers the given type with the serialization stack.

    Registers the given type with the serialization stack. If the type is eventually serialized as a POJO, then the type is registered with the POJO serializer. If the type ends up being serialized with Kryo, then it will be registered at Kryo to make sure that only tags are written.

  44. def registerTypeWithKryoSerializer(clazz: Class[_], serializer: Class[_ <: Serializer[_]]): Unit

    Permalink

    Registers the given type with the serializer at the KryoSerializer.

  45. def registerTypeWithKryoSerializer[T <: Serializer[_] with Serializable](clazz: Class[_], serializer: T): Unit

    Permalink

    Registers the given type with the serializer at the KryoSerializer.

    Registers the given type with the serializer at the KryoSerializer.

    Note that the serializer instance must be serializable (as defined by java.io.Serializable), because it may be distributed to the worker nodes by java serialization.

  46. def setParallelism(parallelism: Int): Unit

    Permalink

    Sets the parallelism (parallelism) for operations executed through this environment.

    Sets the parallelism (parallelism) for operations executed through this environment. Setting a parallelism of x here will cause all operators (such as join, map, reduce) to run with x parallel instances. This value can be overridden by specific operations using DataSet.setParallelism.

  47. def setRestartStrategy(restartStrategyConfiguration: RestartStrategyConfiguration): Unit

    Permalink

    Sets the restart strategy configuration.

    Sets the restart strategy configuration. The configuration specifies which restart strategy will be used for the execution graph in case of a restart.

    restartStrategyConfiguration

    Restart strategy configuration to be set

    Annotations
    @PublicEvolving()
  48. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  49. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  50. def union[T](sets: Seq[DataSet[T]]): DataSet[T]

    Permalink
  51. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  52. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  53. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Deprecated Value Members

  1. def getNumberOfExecutionRetries: Int

    Permalink

    Gets the number of times the system will try to re-execute failed tasks.

    Gets the number of times the system will try to re-execute failed tasks. A value of "-1" indicates that the system default value (as defined in the configuration) should be used.

    Annotations
    @Deprecated @PublicEvolving()
    Deprecated

    This method will be replaced by getRestartStrategy. The FixedDelayRestartStrategyConfiguration contains the number of execution retries.

  2. def setNumberOfExecutionRetries(numRetries: Int): Unit

    Permalink

    Sets the number of times that failed tasks are re-executed.

    Sets the number of times that failed tasks are re-executed. A value of zero effectively disables fault tolerance. A value of "-1" indicates that the system default value (as defined in the configuration) should be used.

    Annotations
    @Deprecated @PublicEvolving()
    Deprecated

    This method will be replaced by setRestartStrategy(). The FixedDelayRestartStrategyConfiguration contains the number of execution retries.

Inherited from AnyRef

Inherited from Any

Ungrouped