It supports to convert between PyFlink Table and Pandas DataFrame.
It supports creating a PyFlink Table from a Pandas DataFrame. Internally, it will serialize the Pandas DataFrame using Arrow columnar format at client side and the serialized data will be processed and deserialized in Arrow source during execution. The Arrow source could also be used in streaming jobs and it will properly handle the checkpoint and provides the exactly once guarantees.
The following example shows how to create a PyFlink Table from a Pandas DataFrame:
It also supports converting a PyFlink Table to a Pandas DataFrame. Internally, it will materialize the results of the table and serialize them into multiple Arrow batches of Arrow columnar format at client side. The maximum Arrow batch size is determined by the config option python.fn-execution.arrow.batch.size. The serialized data will then be converted to Pandas DataFrame. It will collect the content of the table to the client side and so please make sure that the content of the table could fit in memory before calling this method. You can limit the number of rows collected to client side via Table.limit
The following example shows how to convert a PyFlink Table to a Pandas DataFrame: