IN
- Type of the input elements.OUT
- Type of the output elements.UDFIN
- Type of the UDF input type.UDFOUT
- Type of the UDF input type.@Internal public abstract class AbstractPythonScalarFunctionOperator<IN,OUT,UDFIN,UDFOUT> extends AbstractPythonFunctionOperator<IN,OUT>
ScalarFunction
s. It executes the Python
ScalarFunction
s in separate Python execution environment.
The inputs are assumed as the following format: {{{ +------------------+--------------+ | forwarded fields | extra fields | +------------------+--------------+ }}}.
The Python UDFs may take input columns directly from the input row or the execution result of Java UDFs: 1) The input columns from the input row can be referred from the 'forwarded fields'; 2) The Java UDFs will be computed and the execution results can be referred from the 'extra fields'.
The outputs will be as the following format: {{{ +------------------+-------------------------+ | forwarded fields | scalar function results | +------------------+-------------------------+ }}}.
AbstractStreamOperator.CountingOutput<OUT>
Modifier and Type | Field and Description |
---|---|
protected int[] |
forwardedFields
The offset of the fields which should be forwarded.
|
protected LinkedBlockingQueue<IN> |
forwardedInputQueue
The queue holding the input elements for which the execution results have not been received.
|
protected RowType |
inputType
The input logical type.
|
protected RowType |
outputType
The output logical type.
|
protected PythonFunctionInfo[] |
scalarFunctions
The Python
ScalarFunction s to be executed. |
protected int[] |
udfInputOffsets
The offsets of udf inputs.
|
protected RowType |
udfInputType
The udf input logical type.
|
protected RowType |
udfOutputType
The udf output logical type.
|
protected LinkedBlockingQueue<UDFOUT> |
udfResultQueue
The queue holding the user-defined function execution results.
|
chainingStrategy, latencyStats, LOG, metrics, output, timeServiceManager
Modifier and Type | Method and Description |
---|---|
abstract void |
bufferInput(IN input)
Buffers the specified input, it will be used to construct
the operator result together with the udf execution result.
|
PythonFunctionRunner<IN> |
createPythonFunctionRunner()
Creates the
PythonFunctionRunner which is responsible for Python user-defined function execution. |
abstract PythonFunctionRunner<UDFIN> |
createPythonFunctionRunner(org.apache.beam.sdk.fn.data.FnDataReceiver<UDFOUT> resultReceiver,
PythonEnvironmentManager pythonEnvironmentManager) |
PythonEnv |
getPythonEnv()
Returns the
PythonEnv used to create PythonEnvironmentManager.. |
abstract UDFIN |
getUdfInput(IN element) |
void |
open()
This method is called immediately before any elements are processed, it should contain the
operator's initialization logic, e.g.
|
void |
processElement(StreamRecord<IN> element)
Processes one element that arrived at this operator.
|
close, createPythonEnvironmentManager, dispose, emitResults, endInput, prepareSnapshotPreBarrier, processWatermark
getChainingStrategy, getContainingTask, getCurrentKey, getExecutionConfig, getInternalTimerService, getKeyedStateBackend, getKeyedStateStore, getMetricGroup, getOperatorConfig, getOperatorID, getOperatorName, getOperatorStateBackend, getOrCreateKeyedState, getPartitionedState, getPartitionedState, getProcessingTimeService, getRuntimeContext, getUserCodeClassloader, initializeState, initializeState, notifyCheckpointComplete, numEventTimeTimers, numProcessingTimeTimers, processLatencyMarker, processLatencyMarker1, processLatencyMarker2, processWatermark1, processWatermark2, reportOrForwardLatencyMarker, setChainingStrategy, setCurrentKey, setKeyContextElement1, setKeyContextElement2, setup, snapshotState, snapshotState
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
processLatencyMarker
getChainingStrategy, getMetricGroup, getOperatorID, initializeState, setChainingStrategy, setKeyContextElement1, setKeyContextElement2, snapshotState
notifyCheckpointComplete
getCurrentKey, setCurrentKey
protected final PythonFunctionInfo[] scalarFunctions
ScalarFunction
s to be executed.protected final RowType inputType
protected final RowType outputType
protected final int[] udfInputOffsets
protected final int[] forwardedFields
protected transient RowType udfInputType
protected transient RowType udfOutputType
protected transient LinkedBlockingQueue<IN> forwardedInputQueue
protected transient LinkedBlockingQueue<UDFOUT> udfResultQueue
public void open() throws Exception
AbstractStreamOperator
The default implementation does nothing.
open
in interface StreamOperator<OUT>
open
in class AbstractPythonFunctionOperator<IN,OUT>
Exception
- An exception in this method causes the operator to fail.public void processElement(StreamRecord<IN> element) throws Exception
OneInputStreamOperator
processElement
in interface OneInputStreamOperator<IN,OUT>
processElement
in class AbstractPythonFunctionOperator<IN,OUT>
Exception
public PythonEnv getPythonEnv()
AbstractPythonFunctionOperator
PythonEnv
used to create PythonEnvironmentManager..getPythonEnv
in class AbstractPythonFunctionOperator<IN,OUT>
public PythonFunctionRunner<IN> createPythonFunctionRunner() throws IOException
AbstractPythonFunctionOperator
PythonFunctionRunner
which is responsible for Python user-defined function execution.createPythonFunctionRunner
in class AbstractPythonFunctionOperator<IN,OUT>
IOException
public abstract void bufferInput(IN input)
public abstract PythonFunctionRunner<UDFIN> createPythonFunctionRunner(org.apache.beam.sdk.fn.data.FnDataReceiver<UDFOUT> resultReceiver, PythonEnvironmentManager pythonEnvironmentManager)
Copyright © 2014–2020 The Apache Software Foundation. All rights reserved.