Common Questions

This page describes the solutions to some common questions for PyFlink users.

Preparing Python Virtual Environment

You can download a convenience script to prepare a Python virtual env zip which can be used on Mac OS and most Linux distributions. You can specify the PyFlink version to generate a Python virtual environment required for the corresponding PyFlink version, otherwise the most recent version will be installed.

$ sh setup-pyflink-virtual-env.sh 1.11.0

After setting up a python virtual environment, as described in the previous section, you should activate the environment before executing the PyFlink job.

Local

# activate the conda python virtual environment
$ source venv/bin/activate
$ python xxx.py

Cluster

$ # specify the Python virtual environment
$ table_env.add_python_archive("venv.zip")
$ # specify the path of the python interpreter which is used to execute the python UDF workers
$ table_env.get_config().set_python_executable("venv.zip/venv/bin/python")

For details on the usage of add_python_archive and set_python_executable, you can refer to the relevant documentation.

Adding Jar Files

A PyFlink job may depend on jar files, i.e. connectors, Java UDFs, etc. You can specify the dependencies with the following Python Table APIs or through command-line arguments directly when submitting the job.

# NOTE: Only local file URLs (start with "file:") are supported.
table_env.get_config().get_configuration().set_string("pipeline.jars", "file:///my/jar/path/connector.jar;file:///my/jar/path/udf.jar")

# NOTE: The Paths must specify a protocol (e.g. "file") and users should ensure that the URLs are accessible on both the client and the cluster.
table_env.get_config().get_configuration().set_string("pipeline.classpaths", "file:///my/jar/path/connector.jar;file:///my/jar/path/udf.jar")

For details about the APIs of adding Java dependency, you can refer to the relevant documentation

Adding Python Files

You can use the command-line arguments pyfs or the API add_python_file of TableEnvironment to add python file dependencies which could be python files, python packages or local directories. For example, if you have a directory named myDir which has the following hierarchy:

myDir
├──utils
    ├──__init__.py
    ├──my_util.py

You can add the Python files of directory myDir as following:

table_env.add_python_file('myDir')

def my_udf():
    from utils import my_util