Stateful functions and operators store data across the processing of individual elements/events, making state a critical building block for any type of more elaborate operation. For example:
In order to make state fault tolerant, Flink needs to be aware of the state and checkpoint it. In many cases, Flink can also manage the state for the application, meaning Flink deals with the memory management (possibly spilling to disk if necessary) to allow applications to hold very large state.
This document explains how to use Flink’s state abstractions when developing an application.
There are two basic state backends:
Keyed State and
Keyed State is always relative to keys and can only be used in functions and operators on a
Examples of keyed state are the
ListState that one can create in a function on a
well as the state of a keyed window operator.
Keyed State is organized in so called Key Groups. Key Groups are the unit by which keyed state can be redistributed and there are as many key groups as the defined maximum parallelism. During execution each parallel instance of an operator gets one or more key groups.
Operator State is state per parallel subtask. It subsumes the
Checkpointed interface in Flink 1.0 and Flink 1.1.
CheckpointedFunction interface is basically a shortcut (syntactic sugar) for the Operator State.
Operator State needs special re-distribution schemes when parallelism is changed. There can be different variations of such schemes; the following are currently defined:
Keyed State and Operator State exist in two forms: managed and raw.
Managed State is represented in data structures controlled by the Flink runtime, such as internal hash tables, or RocksDB. Examples are “ValueState”, “ListState”, etc. Flink’s runtime encodes the states and writes them into the checkpoints.
Raw State is state that users and operators keep in their own data structures. When checkpointed, they only write a sequence of bytes into the checkpoint. Flink knows nothing about the state’s data structures and sees only the raw bytes.