This documentation is for an out-of-date version of Apache Flink. We recommend you use the latest stable version.

Python API

PyFlink

PyFlink is a Python API for Apache Flink that allows you to build scalable batch and streaming workloads, such as real-time data processing pipelines, large-scale exploratory data analysis, Machine Learning (ML) pipelines and ETL processes. If you’re already familiar with Python and libraries such as Pandas, then PyFlink makes it simpler to leverage the full capabilities of the Flink ecosystem. Depending on the level of abstraction you need, there are two different APIs that can be used in PyFlink:

  • The PyFlink Table API allows you to write powerful relational queries in a way that is similar to using SQL or working with tabular data in Python.
  • At the same time, the PyFlink DataStream API gives you lower-level control over the core building blocks of Flink, state and time, to build more complex stream processing use cases.

If you’re interested in playing around with Flink, try one of our tutorials:

The reference documentation covers all the details. Some starting points:

If you get stuck, check out our community support resources. In particular, Apache Flink’s user mailing list is consistently ranked as one of the most active of any Apache project, and is a great way to get help quickly.