T
- The type of emitted records.@Internal public class SourceOutputWithWatermarks<T> extends Object implements SourceOutput<T>
PushingAsyncDataInput.DataOutput
. The watermarks are pushed into the same output, or into
a separate WatermarkOutput
, if one is provided.
This output does not implement automatic periodic watermark emission. The method emitPeriodicWatermark()
needs to be called periodically.
The methods SourceOutput.collect(Object)
and SourceOutput.collect(Object,
long)
are highly performance-critical (part of the hot loop). To make the code as JIT friendly
as possible, we want to have only a single implementation of these two methods, across all
classes. That way, the JIT compiler can de-virtualize (and inline) them better.
Currently, we have one implementation of these methods for the case where we don't need
watermarks (see class NoOpTimestampsAndWatermarks
) and one for the case where we do (this
class). When the JVM is dedicated to a single job (or type of job) only one of these classes will
be loaded. In mixed job setups, we still have a bimorphic method (rather than a
poly/-/mega-morphic method).
Modifier | Constructor and Description |
---|---|
protected |
SourceOutputWithWatermarks(PushingAsyncDataInput.DataOutput<T> recordsOutput,
WatermarkOutput onEventWatermarkOutput,
WatermarkOutput periodicWatermarkOutput,
TimestampAssigner<T> timestampAssigner,
WatermarkGenerator<T> watermarkGenerator)
Creates a new SourceOutputWithWatermarks that emits records to the given DataOutput and
watermarks to the (possibly different) WatermarkOutput.
|
Modifier and Type | Method and Description |
---|---|
void |
collect(T record)
Emit a record without a timestamp.
|
void |
collect(T record,
long timestamp)
Emit a record with a timestamp.
|
static <E> SourceOutputWithWatermarks<E> |
createWithSeparateOutputs(PushingAsyncDataInput.DataOutput<E> recordsOutput,
WatermarkOutput onEventWatermarkOutput,
WatermarkOutput periodicWatermarkOutput,
TimestampAssigner<E> timestampAssigner,
WatermarkGenerator<E> watermarkGenerator)
Creates a new SourceOutputWithWatermarks that emits records to the given DataOutput and
watermarks to the different WatermarkOutputs.
|
void |
emitPeriodicWatermark() |
void |
emitWatermark(Watermark watermark)
Emits the given watermark.
|
void |
markActive()
Marks this output as active, meaning that downstream operations should wait for watermarks
from this output.
|
void |
markIdle()
Marks this output as idle, meaning that downstream operations do not wait for watermarks from
this output.
|
protected SourceOutputWithWatermarks(PushingAsyncDataInput.DataOutput<T> recordsOutput, WatermarkOutput onEventWatermarkOutput, WatermarkOutput periodicWatermarkOutput, TimestampAssigner<T> timestampAssigner, WatermarkGenerator<T> watermarkGenerator)
public final void collect(T record)
SourceOutput
Use this method if the source system does not have a notion of records with timestamps.
The events later pass through a TimestampAssigner
, which attaches a timestamp to
the event based on the event's contents. For example a file source with JSON records would
not have a generic timestamp from the file reading and JSON parsing process, and thus use
this method to produce initially a record without a timestamp. The TimestampAssigner
in the next step would be used to extract timestamp from a field of the JSON object.
collect
in interface SourceOutput<T>
record
- the record to emit.public final void collect(T record, long timestamp)
SourceOutput
Use this method if the source system has timestamps attached to records. Typical examples would be Logs, PubSubs, or Message Queues, like Kafka or Kinesis, which store a timestamp with each event.
The events typically still pass through a TimestampAssigner
, which may decide to
either use this source-provided timestamp, or replace it with a timestamp stored within the
event (for example if the event was a JSON object one could configure aTimestampAssigner that
extracts one of the object's fields and uses that as a timestamp).
collect
in interface SourceOutput<T>
record
- the record to emit.timestamp
- the timestamp of the record.public final void emitWatermark(Watermark watermark)
WatermarkOutput
Emitting a watermark also implicitly marks the stream as active, ending previously marked idleness.
emitWatermark
in interface WatermarkOutput
public final void markIdle()
WatermarkOutput
An output becomes active again as soon as the next watermark is emitted or WatermarkOutput.markActive()
is explicitly called.
markIdle
in interface WatermarkOutput
public void markActive()
WatermarkOutput
markActive
in interface WatermarkOutput
public final void emitPeriodicWatermark()
public static <E> SourceOutputWithWatermarks<E> createWithSeparateOutputs(PushingAsyncDataInput.DataOutput<E> recordsOutput, WatermarkOutput onEventWatermarkOutput, WatermarkOutput periodicWatermarkOutput, TimestampAssigner<E> timestampAssigner, WatermarkGenerator<E> watermarkGenerator)
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.