Configuration

Depending on the requirements of a Python Table API program, it might be necessary to adjust certain parameters for optimization. All the config options available for Java/Scala Table API program could also be used in the Python Table API program. You could refer to the Table API Configuration for more details on all the available config options for Java/Scala Table API programs. It has also provided examples on how to set the config options in a Table API program.

Python Options

Key Default Type Description
python.fn-execution.arrow.batch.size
1000 Integer The maximum number of elements to include in an arrow batch for Python user-defined function execution. The arrow batch size should not exceed the bundle size. Otherwise, the bundle size will be used as the arrow batch size.
python.fn-execution.buffer.memory.size
"15mb" String The amount of memory to be allocated by the input buffer and output buffer of a Python worker. The memory will be accounted as managed memory if the actual memory allocated to an operator is no less than the total memory of a Python worker. Otherwise, this configuration takes no effect.
python.fn-execution.bundle.size
1000 Integer The maximum number of elements to include in a bundle for Python user-defined function execution. The elements are processed asynchronously. One bundle of elements are processed before processing the next bundle of elements. A larger value can improve the throughput, but at the cost of more memory usage and higher latency.
python.fn-execution.bundle.time
1000 Long Sets the waiting timeout(in milliseconds) before processing a bundle for Python user-defined function execution. The timeout defines how long the elements of a bundle will be buffered before being processed. Lower timeouts lead to lower tail latencies, but may affect throughput.
python.fn-execution.framework.memory.size
"64mb" String The amount of memory to be allocated by the Python framework. The sum of the value of this configuration and "python.fn-execution.buffer.memory.size" represents the total memory of a Python worker. The memory will be accounted as managed memory if the actual memory allocated to an operator is no less than the total memory of a Python worker. Otherwise, this configuration takes no effect.
python.metric.enabled
true Boolean When it is false, metric for Python will be disabled. You can disable the metric to achieve better performance at some circumstance.