This section is relevant for programs running on event time. For an introduction to event time,
processing time, and ingestion time, please refer to the introduction to event time.
To work with event time, streaming programs need to set the time characteristic accordingly.
In order to work with event time, Flink needs to know the events’ timestamps, meaning each element in the
stream needs to have its event timestamp assigned. This is usually done by accessing/extracting the
timestamp from some field in the element.
Timestamp assignment goes hand-in-hand with generating watermarks, which tell the system about
progress in event time.
There are two ways to assign timestamps and generate watermarks:
Directly in the data stream source
Via a timestamp assigner / watermark generator: in Flink timestamp assigners also define the watermarks to be emitted
Attention Both timestamps and watermarks are specified as
milliseconds since the Java epoch of 1970-01-01T00:00:00Z.
Source Functions with Timestamps and Watermarks
Stream sources can also directly assign timestamps to the elements they produce, and they can also emit watermarks.
When this is done, no timestamp assigner is needed.
Note that if a timestamp assigner is used, any timestamps and watermarks provided by the source will be overwritten.
To assign a timestamp to an element in the source directly, the source must use the collectWithTimestamp(...)
method on the SourceContext. To generate watermarks, the source must call the emitWatermark(Watermark) function.
Below is a simple example of a (non-checkpointed) source that assigns timestamps and generates watermarks:
Timestamp Assigners / Watermark Generators
Timestamp assigners take a stream and produce a new stream with timestamped elements and watermarks. If the
original stream had timestamps and/or watermarks already, the timestamp assigner overwrites them.
Timestamp assigners are usually specified immediately after the data source, but it is not strictly required to do so.
A common pattern, for example, is to parse (MapFunction) and filter (FilterFunction) before the timestamp assigner.
In any case, the timestamp assigner needs to be specified before the first operation on event time
(such as the first window operation). As a special case, when using Kafka as the source of a streaming job,
Flink allows the specification of a timestamp assigner / watermark emitter inside
the source (or consumer) itself. More information on how to do so can be found in the
Kafka Connector documentation.
NOTE: The remainder of this section presents the main interfaces a programmer has
to implement in order to create her own timestamp extractors/watermark emitters.
To see the pre-implemented extractors that ship with Flink, please refer to the
Pre-defined Timestamp Extractors / Watermark Emitters page.
With Periodic Watermarks
AssignerWithPeriodicWatermarks assigns timestamps and generates watermarks periodically (possibly depending
on the stream elements, or purely based on processing time).
The interval (every n milliseconds) in which the watermark will be generated is defined via
ExecutionConfig.setAutoWatermarkInterval(...). The assigner’s getCurrentWatermark() method will be
called each time, and a new watermark will be emitted if the returned watermark is non-null and larger than the previous
Two simple examples of timestamp assigners with periodic watermark generation are below.
With Punctuated Watermarks
To generate watermarks whenever a certain event indicates that a new watermark might be generated, use
AssignerWithPunctuatedWatermarks. For this class Flink will first call the extractTimestamp(...) method
to assign the element a timestamp, and then immediately call the
checkAndGetNextWatermark(...) method on that element.
The checkAndGetNextWatermark(...) method is passed the timestamp that was assigned in the extractTimestamp(...)
method, and can decide whether it wants to generate a watermark. Whenever the checkAndGetNextWatermark(...)
method returns a non-null watermark, and that watermark is larger than the latest previous watermark, that
new watermark will be emitted.
Note: It is possible to generate a watermark on every single event. However, because each watermark causes some
computation downstream, an excessive number of watermarks degrades performance.
Timestamps per Kafka Partition
When using Apache Kafka as a data source, each Kafka partition may have a simple event time pattern (ascending
timestamps or bounded out-of-orderness). However, when consuming streams from Kafka, multiple partitions often get consumed in parallel,
interleaving the events from the partitions and destroying the per-partition patterns (this is inherent in how Kafka’s consumer clients work).
In that case, you can use Flink’s Kafka-partition-aware watermark generation. Using that feature, watermarks are generated inside the
Kafka consumer, per Kafka partition, and the per-partition watermarks are merged in the same way as watermarks are merged on stream shuffles.
For example, if event timestamps are strictly ascending per Kafka partition, generating per-partition watermarks with the
ascending timestamps watermark generator will result in perfect overall watermarks.
The illustrations below show how to use the per-Kafka-partition watermark generation, and how watermarks propagate through the
streaming dataflow in that case.