ExecutionEnvironment

Instance Constructors

new ExecutionEnvironment(javaEnv: java.ExecutionEnvironment)

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def addDefaultKryoSerializer[T <: Serializer[_] with Serializable](clazz: Class[_], serializer: T): Unit

Registers a default serializer for the given class and its sub-classes at Kryo.
Registers a default serializer for the given class and its sub-classes at Kryo.
Note that the serializer instance must be serializable (as defined by java.io.Serializable), because it may be distributed to the worker nodes by java serialization.
def addDefaultKryoSerializer(clazz: Class[_], serializer: Class[_ <: Serializer[_]]): Unit

Registers a default serializer for the given class and its sub-classes at Kryo.
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def createInput[T](inputFormat: InputFormat[T, _])(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

Generic method to create an input DataSet with an org.apache.flink.api.common.io.InputFormat.
def createProgramPlan(jobName: String = ""): Plan

Creates the program's org.apache.flink.api.common.Plan.
Creates the program's org.apache.flink.api.common.Plan. The plan is a description of all data sources, data sinks, and operations and how they interact, as an isolated unit that can be executed with a org.apache.flink.api.common.PlanExecutor. Obtaining a plan and starting it with an executor is an alternative way to run a program and is only possible if the program only consists of distributed operations.
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def execute(jobName: String): JobExecutionResult

Triggers the program execution.
Triggers the program execution. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results DataSet.print, writing results (e.g. DataSet.writeAsText, DataSet.write, or other generic data sinks created with DataSet.output.
The program execution will be logged and displayed with the given name.
returns
The result of the job execution, containing elapsed time and accumulators.
def execute(): JobExecutionResult

Triggers the program execution.
Triggers the program execution. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results DataSet.print, writing results (e.g. DataSet.writeAsText, DataSet.write, or other generic data sinks created with DataSet.output.
The program execution will be logged and displayed with a generated default name.
returns
The result of the job execution, containing elapsed time and accumulators.
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def fromCollection[T](data: Iterator[T])(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

Creates a DataSet from the given Iterator.
Creates a DataSet from the given Iterator.
Note that this operation will result in a non-parallel data source, i.e. a data source with a parallelism of one.
def fromCollection[T](data: Iterable[T])(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

Creates a DataSet from the given non-empty Iterable.
Creates a DataSet from the given non-empty Iterable.
Note that this operation will result in a non-parallel data source, i.e. a data source with a parallelism of one.
def fromElements[T](data: T*)(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

Creates a new data set that contains the given elements.
Creates a new data set that contains the given elements.
* Note that this operation will result in a non-parallel data source, i.e. a data source with a parallelism of one.
def fromParallelCollection[T](iterator: SplittableIterator[T])(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

Creates a new data set that contains elements in the iterator.
Creates a new data set that contains elements in the iterator. The iterator is splittable, allowing the framework to create a parallel data source that returns the elements in the iterator.
def generateSequence(from: Long, to: Long): DataSet[Long]

Creates a new data set that contains a sequence of numbers.
Creates a new data set that contains a sequence of numbers. The data set will be created in parallel, so there is no guarantee about the oder of the elements.
from
The number to start at (inclusive).
to
The number to stop at (inclusive).
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def getConfig: ExecutionConfig

Gets the config object.
def getExecutionPlan(): String

Creates the plan with which the system will execute the program, and returns it as a String using a JSON representation of the execution data flow graph.
def getId: JobID

Gets the UUID by which this environment is identified.
Gets the UUID by which this environment is identified. The UUID sets the execution context in the cluster or local environment.

Annotations
@PublicEvolving()
def getIdString: String

Gets the UUID by which this environment is identified, as a string.
Gets the UUID by which this environment is identified, as a string.

Annotations
@PublicEvolving()
def getJavaEnv: java.ExecutionEnvironment

returns
the Java Execution environment.
def getLastJobExecutionResult: JobExecutionResult

Gets the JobExecutionResult of the last executed job.
def getParallelism: Int

Returns the default parallelism for this execution environment.
Returns the default parallelism for this execution environment. Note that this value can be overridden by individual operations using DataSet.setParallelism
def getRestartStrategy: RestartStrategyConfiguration

Returns the specified restart strategy configuration.
Returns the specified restart strategy configuration.
returns
The restart strategy configuration to be used

Annotations
@PublicEvolving()
def getSessionTimeout: Long

Gets the session timeout for this environment.
Gets the session timeout for this environment. The session timeout defines for how long after an execution, the job and its intermediate results will be kept for future interactions.
returns
The session timeout, in seconds.

Annotations
@PublicEvolving()
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def readCsvFile[T](filePath: String, lineDelimiter: String = "\n", fieldDelimiter: String = ",", quoteCharacter: Character = null, ignoreFirstLine: Boolean = false, ignoreComments: String = null, lenient: Boolean = false, includedFields: Array[Int] = null, pojoFields: Array[String] = null)(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

Creates a DataSet by reading the given CSV file.
Creates a DataSet by reading the given CSV file. The type parameter must be used to specify a Tuple type that has the same number of fields as there are fields in the CSV file. If the number of fields in the CSV file is not the same, the includedFields parameter can be used to only read specific fields.
filePath
The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").
lineDelimiter
The string that separates lines, defaults to newline.
fieldDelimiter
The string that separates individual fields, defaults to ",".
quoteCharacter
The character to use for quoted String parsing, disabled by default.
ignoreFirstLine
Whether the first line in the file should be ignored.
ignoreComments
Lines that start with the given String are ignored, disabled by default.
lenient
Whether the parser should silently ignore malformed lines.
includedFields
The fields in the file that should be read. Per default all fields are read.
pojoFields
The fields of the POJO which are mapped to CSV fields.
def readFile[T](inputFormat: FileInputFormat[T], filePath: String)(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

Creates a new DataSource by reading the specified file using the custom org.apache.flink.api.common.io.FileInputFormat.
def readFileOfPrimitives[T](filePath: String, delimiter: String = "\n")(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

Creates a DataSet that represents the primitive type produced by reading the given file in delimited way.This method is similar to readCsvFile with single field, but it produces a DataSet not through Tuple.
Creates a DataSet that represents the primitive type produced by reading the given file in delimited way.This method is similar to readCsvFile with single field, but it produces a DataSet not through Tuple. The type parameter must be used to specify the primitive type.
filePath
The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").
delimiter
The string that separates primitives , defaults to newline.
def readTextFile(filePath: String, charsetName: String = "UTF-8"): DataSet[String]

Creates a DataSet of Strings produced by reading the given file line wise.
Creates a DataSet of Strings produced by reading the given file line wise.
filePath
The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").
charsetName
The name of the character set used to read the file. Default is UTF-0
def readTextFileWithValue(filePath: String, charsetName: String = "UTF-8"): DataSet[StringValue]

Creates a DataSet of Strings produced by reading the given file line wise.
Creates a DataSet of Strings produced by reading the given file line wise. This method is similar to readTextFile, but it produces a DataSet with mutable StringValue objects, rather than Java Strings. StringValues can be used to tune implementations to be less object and garbage collection heavy.
filePath
The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").
charsetName
The name of the character set used to read the file. Default is UTF-0
def registerCachedFile(filePath: String, name: String, executable: Boolean = false): Unit

Registers a file at the distributed cache under the given name.
Registers a file at the distributed cache under the given name. The file will be accessible from any user-defined function in the (distributed) runtime under a local path. Files may be local files (as long as all relevant workers have access to it), or files in a distributed file system. The runtime will copy the files temporarily to a local cache, if needed.
The org.apache.flink.api.common.functions.RuntimeContext can be obtained inside UDFs via org.apache.flink.api.common.functions.RichFunction#getRuntimeContext and provides access via org.apache.flink.api.common.functions.RuntimeContext#getDistributedCache
filePath
The path of the file, as a URI (e.g. "file:///some/path" or "hdfs://host:port/and/path")
name
The name under which the file is registered.
executable
Flag indicating whether the file should be executable
def registerType(typeClass: Class[_]): Unit

Registers the given type with the serialization stack.
Registers the given type with the serialization stack. If the type is eventually serialized as a POJO, then the type is registered with the POJO serializer. If the type ends up being serialized with Kryo, then it will be registered at Kryo to make sure that only tags are written.
def registerTypeWithKryoSerializer(clazz: Class[_], serializer: Class[_ <: Serializer[_]]): Unit

Registers the given type with the serializer at the KryoSerializer.
def registerTypeWithKryoSerializer[T <: Serializer[_] with Serializable](clazz: Class[_], serializer: T): Unit

Registers the given type with the serializer at the KryoSerializer.
Registers the given type with the serializer at the KryoSerializer.
Note that the serializer instance must be serializable (as defined by java.io.Serializable), because it may be distributed to the worker nodes by java serialization.
def setParallelism(parallelism: Int): Unit

Sets the parallelism (parallelism) for operations executed through this environment.
Sets the parallelism (parallelism) for operations executed through this environment. Setting a parallelism of x here will cause all operators (such as join, map, reduce) to run with x parallel instances. This value can be overridden by specific operations using DataSet.setParallelism.
def setRestartStrategy(restartStrategyConfiguration: RestartStrategyConfiguration): Unit

Sets the restart strategy configuration.
Sets the restart strategy configuration. The configuration specifies which restart strategy will be used for the execution graph in case of a restart.
restartStrategyConfiguration
Restart strategy configuration to be set

Annotations
@PublicEvolving()
def setSessionTimeout(timeout: Long): Unit

Sets the session timeout to hold the intermediate results of a job.
Sets the session timeout to hold the intermediate results of a job. This only applies the updated timeout in future executions.
timeout
The timeout in seconds.

Annotations
@PublicEvolving()
def startNewSession(): Unit

Starts a new session, discarding all intermediate results.
Starts a new session, discarding all intermediate results.

Annotations
@PublicEvolving()
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
def union[T](sets: Seq[DataSet[T]]): DataSet[T]
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Deprecated Value Members

def getNumberOfExecutionRetries: Int

Gets the number of times the system will try to re-execute failed tasks.
Gets the number of times the system will try to re-execute failed tasks. A value of "-1" indicates that the system default value (as defined in the configuration) should be used.

Annotations
@Deprecated @PublicEvolving()
Deprecated
This method will be replaced by getRestartStrategy. The FixedDelayRestartStrategyConfiguration contains the number of execution retries.
def setNumberOfExecutionRetries(numRetries: Int): Unit

Sets the number of times that failed tasks are re-executed.
Sets the number of times that failed tasks are re-executed. A value of zero effectively disables fault tolerance. A value of "-1" indicates that the system default value (as defined in the configuration) should be used.

Annotations
@Deprecated @PublicEvolving()
Deprecated
This method will be replaced by setRestartStrategy(). The FixedDelayRestartStrategyConfiguration contains the number of execution retries.

Related Docs: object ExecutionEnvironment | package scala

class ExecutionEnvironment extends AnyRef

Instance Constructors

new ExecutionEnvironment(javaEnv: java.ExecutionEnvironment)

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

def addDefaultKryoSerializer[T <: Serializer[_] with Serializable](clazz: Class[_], serializer: T): Unit

def addDefaultKryoSerializer(clazz: Class[_], serializer: Class[_ <: Serializer[_]]): Unit

final def asInstanceOf[T0]: T0

def clone(): AnyRef

def createInput[T](inputFormat: InputFormat[T, _])(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

def createProgramPlan(jobName: String = ""): Plan

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def execute(jobName: String): JobExecutionResult

def execute(): JobExecutionResult

def finalize(): Unit

def fromCollection[T](data: Iterator[T])(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

def fromCollection[T](data: Iterable[T])(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

def fromElements[T](data: T*)(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

def fromParallelCollection[T](iterator: SplittableIterator[T])(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

def generateSequence(from: Long, to: Long): DataSet[Long]

final def getClass(): Class[_]

def getConfig: ExecutionConfig

def getExecutionPlan(): String

def getId: JobID

def getIdString: String

def getJavaEnv: java.ExecutionEnvironment

def getLastJobExecutionResult: JobExecutionResult

def getParallelism: Int

def getRestartStrategy: RestartStrategyConfiguration

def getSessionTimeout: Long

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def readFile[T](inputFormat: FileInputFormat[T], filePath: String)(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

def readFileOfPrimitives[T](filePath: String, delimiter: String = "\n")(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]

def readTextFile(filePath: String, charsetName: String = "UTF-8"): DataSet[String]

def readTextFileWithValue(filePath: String, charsetName: String = "UTF-8"): DataSet[StringValue]

def registerCachedFile(filePath: String, name: String, executable: Boolean = false): Unit

def registerType(typeClass: Class[_]): Unit

def registerTypeWithKryoSerializer(clazz: Class[_], serializer: Class[_ <: Serializer[_]]): Unit

def registerTypeWithKryoSerializer[T <: Serializer[_] with Serializable](clazz: Class[_], serializer: T): Unit

def setParallelism(parallelism: Int): Unit

def setRestartStrategy(restartStrategyConfiguration: RestartStrategyConfiguration): Unit

def setSessionTimeout(timeout: Long): Unit

def startNewSession(): Unit

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

def union[T](sets: Seq[DataSet[T]]): DataSet[T]

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Deprecated Value Members

def getNumberOfExecutionRetries: Int

def setNumberOfExecutionRetries(numRetries: Int): Unit

Inherited from AnyRef

Inherited from Any

Ungrouped