@PublicEvolving public class FlinkKafkaConsumer08<T> extends FlinkKafkaConsumerBase<T>
The Flink Kafka Consumer participates in checkpointing and guarantees that no data is lost during a failure, and that the computation processes elements "exactly once". (Note: These guarantees naturally assume that Kafka itself does not loose any data.)
Flink's Kafka Consumer is designed to be compatible with Kafka's High-Level Consumer API (0.8.x). Most of Kafka's configuration variables can be used with this consumer as well:
Offsets whose records have been read and are checkpointed will be committed back to ZooKeeper by the offset handler. In addition, the offset handler finds the point where the source initially starts reading from the stream, when the streaming job is started.
Please note that Flink snapshots the offsets internally as part of its distributed checkpoints. The offsets committed to Kafka / ZooKeeper are only to bring the outside view of progress in sync with Flink's view of the progress. That way, monitoring and other jobs can get a view of how far the Flink Kafka consumer has consumed a topic.
If checkpointing is disabled, the consumer will periodically commit the current offset to Zookeeper.
When using a Kafka topic to send data between Flink jobs, we recommend using the
TypeInformationSerializationSchema
and TypeInformationKeyValueSerializationSchema
.
SourceFunction.SourceContext<T>
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_GET_PARTITIONS_RETRIES
Default number of retries for getting the partition info.
|
static String |
GET_PARTITIONS_RETRIES_KEY
Configuration key for the number of retries for getting the partition info.
|
deserializer, KEY_DISABLE_METRICS, KEY_PARTITION_DISCOVERY_INTERVAL_MILLIS, LOG, MAX_NUM_PENDING_CHECKPOINTS, PARTITION_DISCOVERY_DISABLED
Constructor and Description |
---|
FlinkKafkaConsumer08(List<String> topics,
DeserializationSchema<T> deserializer,
Properties props)
Creates a new Kafka streaming source consumer for Kafka 0.8.x
|
FlinkKafkaConsumer08(List<String> topics,
KafkaDeserializationSchema<T> deserializer,
Properties props)
Creates a new Kafka streaming source consumer for Kafka 0.8.x
|
FlinkKafkaConsumer08(Pattern subscriptionPattern,
DeserializationSchema<T> valueDeserializer,
Properties props)
Creates a new Kafka streaming source consumer for Kafka 0.8.x.
|
FlinkKafkaConsumer08(Pattern subscriptionPattern,
KafkaDeserializationSchema<T> deserializer,
Properties props)
Creates a new Kafka streaming source consumer for Kafka 0.8.x.
|
FlinkKafkaConsumer08(String topic,
DeserializationSchema<T> valueDeserializer,
Properties props)
Creates a new Kafka streaming source consumer for Kafka 0.8.x.
|
FlinkKafkaConsumer08(String topic,
KafkaDeserializationSchema<T> deserializer,
Properties props)
Creates a new Kafka streaming source consumer for Kafka 0.8.x
|
Modifier and Type | Method and Description |
---|---|
protected AbstractFetcher<T,?> |
createFetcher(SourceFunction.SourceContext<T> sourceContext,
Map<KafkaTopicPartition,Long> assignedPartitionsWithInitialOffsets,
SerializedValue<AssignerWithPeriodicWatermarks<T>> watermarksPeriodic,
SerializedValue<AssignerWithPunctuatedWatermarks<T>> watermarksPunctuated,
StreamingRuntimeContext runtimeContext,
OffsetCommitMode offsetCommitMode,
MetricGroup consumerMetricGroup,
boolean useMetrics)
Creates the fetcher that connect to the Kafka brokers, pulls data, deserialized the
data, and emits it into the data streams.
|
protected AbstractPartitionDiscoverer |
createPartitionDiscoverer(KafkaTopicsDescriptor topicsDescriptor,
int indexOfThisSubtask,
int numParallelSubtasks)
Creates the partition discoverer that is used to find new partitions for this subtask.
|
protected Map<KafkaTopicPartition,Long> |
fetchOffsetsWithTimestamp(Collection<KafkaTopicPartition> partitions,
long timestamp) |
protected boolean |
getIsAutoCommitEnabled() |
protected static void |
validateZooKeeperConfig(Properties props)
Validate the ZK configuration, checking for required parameters.
|
assignTimestampsAndWatermarks, assignTimestampsAndWatermarks, cancel, close, disableFilterRestoredPartitionsWithSubscribedTopics, getProducedType, initializeState, notifyCheckpointComplete, open, run, setCommitOffsetsOnCheckpoints, setStartFromEarliest, setStartFromGroupOffsets, setStartFromLatest, setStartFromSpecificOffsets, setStartFromTimestamp, snapshotState
getIterationRuntimeContext, getRuntimeContext, setRuntimeContext
public static final String GET_PARTITIONS_RETRIES_KEY
public static final int DEFAULT_GET_PARTITIONS_RETRIES
public FlinkKafkaConsumer08(String topic, DeserializationSchema<T> valueDeserializer, Properties props)
topic
- The name of the topic that should be consumed.valueDeserializer
- The de-/serializer used to convert between Kafka's byte messages and Flink's objects.props
- The properties used to configure the Kafka consumer client, and the ZooKeeper client.public FlinkKafkaConsumer08(String topic, KafkaDeserializationSchema<T> deserializer, Properties props)
This constructor allows passing a KafkaDeserializationSchema
for reading key/value
pairs, offsets, and topic names from Kafka.
topic
- The name of the topic that should be consumed.deserializer
- The keyed de-/serializer used to convert between Kafka's byte messages and Flink's objects.props
- The properties used to configure the Kafka consumer client, and the ZooKeeper client.public FlinkKafkaConsumer08(List<String> topics, DeserializationSchema<T> deserializer, Properties props)
This constructor allows passing multiple topics to the consumer.
topics
- The Kafka topics to read from.deserializer
- The de-/serializer used to convert between Kafka's byte messages and Flink's objects.props
- The properties that are used to configure both the fetcher and the offset handler.public FlinkKafkaConsumer08(List<String> topics, KafkaDeserializationSchema<T> deserializer, Properties props)
This constructor allows passing multiple topics and a key/value deserialization schema.
topics
- The Kafka topics to read from.deserializer
- The keyed de-/serializer used to convert between Kafka's byte messages and Flink's objects.props
- The properties that are used to configure both the fetcher and the offset handler.@PublicEvolving public FlinkKafkaConsumer08(Pattern subscriptionPattern, DeserializationSchema<T> valueDeserializer, Properties props)
If partition discovery is enabled (by setting a non-negative value for
FlinkKafkaConsumerBase.KEY_PARTITION_DISCOVERY_INTERVAL_MILLIS
in the properties), topics
with names matching the pattern will also be subscribed to as they are created on the fly.
subscriptionPattern
- The regular expression for a pattern of topic names to subscribe to.valueDeserializer
- The de-/serializer used to convert between Kafka's byte messages and Flink's objects.props
- The properties used to configure the Kafka consumer client, and the ZooKeeper client.@PublicEvolving public FlinkKafkaConsumer08(Pattern subscriptionPattern, KafkaDeserializationSchema<T> deserializer, Properties props)
If partition discovery is enabled (by setting a non-negative value for
FlinkKafkaConsumerBase.KEY_PARTITION_DISCOVERY_INTERVAL_MILLIS
in the properties), topics
with names matching the pattern will also be subscribed to as they are created on the fly.
This constructor allows passing a KafkaDeserializationSchema
for reading key/value
pairs, offsets, and topic names from Kafka.
subscriptionPattern
- The regular expression for a pattern of topic names to subscribe to.deserializer
- The keyed de-/serializer used to convert between Kafka's byte messages and Flink's objects.props
- The properties used to configure the Kafka consumer client, and the ZooKeeper client.protected AbstractFetcher<T,?> createFetcher(SourceFunction.SourceContext<T> sourceContext, Map<KafkaTopicPartition,Long> assignedPartitionsWithInitialOffsets, SerializedValue<AssignerWithPeriodicWatermarks<T>> watermarksPeriodic, SerializedValue<AssignerWithPunctuatedWatermarks<T>> watermarksPunctuated, StreamingRuntimeContext runtimeContext, OffsetCommitMode offsetCommitMode, MetricGroup consumerMetricGroup, boolean useMetrics) throws Exception
FlinkKafkaConsumerBase
createFetcher
in class FlinkKafkaConsumerBase<T>
sourceContext
- The source context to emit data to.assignedPartitionsWithInitialOffsets
- The set of partitions that this subtask should handle, with their start offsets.watermarksPeriodic
- Optional, a serialized timestamp extractor / periodic watermark generator.watermarksPunctuated
- Optional, a serialized timestamp extractor / punctuated watermark generator.runtimeContext
- The task's runtime context.Exception
- The method should forward exceptionsprotected AbstractPartitionDiscoverer createPartitionDiscoverer(KafkaTopicsDescriptor topicsDescriptor, int indexOfThisSubtask, int numParallelSubtasks)
FlinkKafkaConsumerBase
createPartitionDiscoverer
in class FlinkKafkaConsumerBase<T>
topicsDescriptor
- Descriptor that describes whether we are discovering partitions for fixed topics or a topic pattern.indexOfThisSubtask
- The index of this consumer subtask.numParallelSubtasks
- The total number of parallel consumer subtasks.protected boolean getIsAutoCommitEnabled()
getIsAutoCommitEnabled
in class FlinkKafkaConsumerBase<T>
protected Map<KafkaTopicPartition,Long> fetchOffsetsWithTimestamp(Collection<KafkaTopicPartition> partitions, long timestamp)
fetchOffsetsWithTimestamp
in class FlinkKafkaConsumerBase<T>
protected static void validateZooKeeperConfig(Properties props)
props
- Properties to checkCopyright © 2014–2020 The Apache Software Foundation. All rights reserved.