@Experimental public abstract class HBaseInputFormat<T extends Tuple> extends RichInputFormat<T,org.apache.flink.connector.hbase.source.TableInputSplit>
InputFormat
subclass that wraps the access for HTables.Modifier and Type | Field and Description |
---|---|
protected byte[] |
currentRow |
protected boolean |
endReached |
protected static org.slf4j.Logger |
LOG |
protected org.apache.hadoop.hbase.client.ResultScanner |
resultScanner
HBase iterator wrapper.
|
protected org.apache.hadoop.hbase.client.Scan |
scan |
protected long |
scannedRows |
protected byte[] |
serializedConfig |
protected org.apache.hadoop.hbase.client.HTable |
table |
Constructor and Description |
---|
HBaseInputFormat(Configuration hConf)
Constructs a
InputFormat with hbase configuration to read data from hbase. |
Modifier and Type | Method and Description |
---|---|
void |
close()
Method that marks the end of the life-cycle of an input split.
|
void |
closeInputFormat()
Closes this InputFormat instance.
|
void |
configure(Configuration parameters)
Creates a
Scan object and opens the HTable connection. |
org.apache.flink.connector.hbase.source.TableInputSplit[] |
createInputSplits(int minNumSplits)
Computes the input splits.
|
protected Configuration |
getHadoopConfiguration() |
InputSplitAssigner |
getInputSplitAssigner(org.apache.flink.connector.hbase.source.TableInputSplit[] inputSplits)
Returns the assigner for the input splits.
|
protected abstract org.apache.hadoop.hbase.client.Scan |
getScanner()
Returns an instance of Scan that retrieves the required subset of records from the HBase
table.
|
BaseStatistics |
getStatistics(BaseStatistics cachedStatistics)
Gets the basic statistics from the input described by this format.
|
protected abstract String |
getTableName()
What table is to be read.
|
protected boolean |
includeRegionInScan(byte[] startKey,
byte[] endKey)
Test if the given region is to be included in the scan while splitting the regions of a
table.
|
protected T |
mapResultToOutType(org.apache.hadoop.hbase.client.Result r)
HBase returns an instance of
Result . |
protected abstract T |
mapResultToTuple(org.apache.hadoop.hbase.client.Result r)
The output from HBase is always an instance of
Result . |
T |
nextRecord(T reuse)
Reads the next record from the input.
|
void |
open(org.apache.flink.connector.hbase.source.TableInputSplit split)
Opens a parallel instance of the input format to work on a split.
|
boolean |
reachedEnd()
Method used to check if the end of the input is reached.
|
getRuntimeContext, openInputFormat, setRuntimeContext
protected static final org.slf4j.Logger LOG
protected boolean endReached
protected transient org.apache.hadoop.hbase.client.HTable table
protected transient org.apache.hadoop.hbase.client.Scan scan
protected org.apache.hadoop.hbase.client.ResultScanner resultScanner
protected byte[] currentRow
protected long scannedRows
protected byte[] serializedConfig
public HBaseInputFormat(Configuration hConf)
InputFormat
with hbase configuration to read data from hbase.hConf
- The configuration that connect to hbase. At least hbase.zookeeper.quorum and
zookeeper.znode.parent need to be set.protected abstract org.apache.hadoop.hbase.client.Scan getScanner()
protected abstract String getTableName()
protected abstract T mapResultToTuple(org.apache.hadoop.hbase.client.Result r)
Result
. This method is to copy the
data in the Result instance into the required Tuple
r
- The Result instance from HBase that needs to be convertedTuple
that contains the needed information.public void configure(Configuration parameters)
Scan
object and opens the HTable
connection. These are opened here
because they are needed in the createInputSplits which is called before the openInputFormat
method. So the connection is opened in configure(Configuration)
and closed in closeInputFormat()
.configure
in interface InputFormat<T extends Tuple,org.apache.flink.connector.hbase.source.TableInputSplit>
parameters
- The configuration that is to be usedConfiguration
protected T mapResultToOutType(org.apache.hadoop.hbase.client.Result r)
Result
.
This method maps the returned Result
instance into the output type T
.
r
- The Result instance from HBase that needs to be convertedT
that contains the data of Result.protected Configuration getHadoopConfiguration()
public void open(org.apache.flink.connector.hbase.source.TableInputSplit split) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be configured.
split
- The split to be opened.IOException
- Thrown, if the spit could not be opened due to an I/O problem.public T nextRecord(T reuse) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
reuse
- Object that may be reused.IOException
- Thrown, if an I/O error occurred.public boolean reachedEnd() throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
IOException
- Thrown, if an I/O error occurred.public void close() throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
IOException
- Thrown, if the input could not be closed properly.public void closeInputFormat() throws IOException
RichInputFormat
RichInputFormat.openInputFormat()
should be closed in this method.closeInputFormat
in class RichInputFormat<T,org.apache.flink.connector.hbase.source.TableInputSplit>
IOException
- in case closing the resources failedInputFormat
public org.apache.flink.connector.hbase.source.TableInputSplit[] createInputSplits(int minNumSplits) throws IOException
InputSplitSource
minNumSplits
- Number of minimal input splits, as a hint.IOException
protected boolean includeRegionInScan(byte[] startKey, byte[] endKey)
startKey
- Start key of the regionendKey
- End key of the regionpublic InputSplitAssigner getInputSplitAssigner(org.apache.flink.connector.hbase.source.TableInputSplit[] inputSplits)
InputSplitSource
public BaseStatistics getStatistics(BaseStatistics cachedStatistics)
InputFormat
When this method is called, the input format is guaranteed to be configured.
cachedStatistics
- The statistics that were cached. May be null.Copyright © 2014–2021 The Apache Software Foundation. All rights reserved.