This documentation is for an out-of-date version of Apache Flink. We recommend you use the latest stable version.

Common Questions

This page describes the solutions to some common questions for PyFlink users.

Preparing Python Virtual Environment

You can download a convenience script to prepare a Python virtual env zip which can be used on Mac OS and most Linux distributions. You can specify the version parameter to generate a Python virtual environment required for the corresponding PyFlink version, otherwise the most recent version will be installed.

$ setup-pyflink-virtual-env.sh 1.10.2

After setting up a python virtual environment, as described in the previous section, you should activate the environment before executing the PyFlink job.

Local

# activate the conda python virtual environment
$ source venv/bin/activate
$ python xxx.py

Cluster

$ # specify the Python virtual environment
$ table_env.add_python_archive("venv.zip")
$ # specify the path of the python interpreter which is used to execute the python UDF workers
$ table_env.get_config().set_python_executable("venv.zip/venv/bin/python")

For details on the usage of add_python_archive and set_python_executable, you can refer to the relevant documentation.

Adding Jar Files

A PyFlink job may depend on jar files, i.e. connectors, Java UDFs, etc. The way to add the jar files is different according to the deployment mode.

Local

You need to copy the jar files to the path site-packages/pyflink/lib of the used Python interpreter. You can execute the following command to find the path:

python -c "import pyflink;import os;print(os.path.dirname(os.path.abspath(pyflink.__file__))+'/lib')"

Cluster

You can use the command-line argument -j <jarFile> to specify the used jar file. For more details about the command-line arguments of -j <jarFile> , You can refer to the relevant documentation.

Note Currently, Flink CLI only allows to specify one jar file. You can package them into one zip file as following:

$ # create a directory named `lib`
$ mkdir lib
$ # move the jar files to the lib directory
$ # the jar files must be located in a directory named `lib`
$ zip -r lib.zip lib
$ flink run -py xxx.py -j lib.zip

Adding Python Files

You can use the command-line arguments pyfs or the API add_python_file of TableEnvironment to add python file dependencies which could be python files, python packages or local directories. For example, if you have a directory named myDir which has the following hierarchy:

myDir
├──utils
    ├──__init__.py
    ├──my_util.py

You can add the Python files of directory myDir as following:

table_env.add_python_file('myDir')

def my_udf():
    from utils import my_util