pyflink.datastream.connectors.file_system.StreamFormat#
- class StreamFormat(j_stream_format)[source]#
A reader format that reads individual records from a stream.
Compared to the
BulkFormat
, the stream format handles a few things out-of-the-box, like deciding how to batch records or dealing with compression.Internally in the file source, the readers pass batches of records from the reading threads (that perform the typically blocking I/O operations) to the async mailbox threads that do the streaming and batch data processing. Passing records in batches (rather than one-at-a-time) much reduces the thread-to-thread handover overhead.
This batching is by default based on I/O fetch size for the StreamFormat, meaning the set of records derived from one I/O buffer will be handed over as one. See config option source.file.stream.io-fetch-size to configure that fetch size.
Methods
text_line_format
([charset_name])Creates a reader format that text lines from a file.