public class OrcRowInputFormat extends FileInputFormat<Row> implements ResultTypeQueryable<Row>
Modifier and Type | Class and Description |
---|---|
static class |
OrcRowInputFormat.Between
An BETWEEN predicate that can be evaluated by the OrcRowInputFormat.
|
static class |
OrcRowInputFormat.Equals
An EQUALS predicate that can be evaluated by the OrcRowInputFormat.
|
static class |
OrcRowInputFormat.In
An IN predicate that can be evaluated by the OrcRowInputFormat.
|
static class |
OrcRowInputFormat.IsNull
An IS_NULL predicate that can be evaluated by the OrcRowInputFormat.
|
static class |
OrcRowInputFormat.LessThan
A LESS_THAN predicate that can be evaluated by the OrcRowInputFormat.
|
static class |
OrcRowInputFormat.LessThanEquals
A LESS_THAN_EQUALS predicate that can be evaluated by the OrcRowInputFormat.
|
static class |
OrcRowInputFormat.Not
A NOT predicate to negate a predicate that can be evaluated by the OrcRowInputFormat.
|
static class |
OrcRowInputFormat.NullSafeEquals
An EQUALS predicate that can be evaluated with Null safety by the OrcRowInputFormat.
|
static class |
OrcRowInputFormat.Or
An OR predicate that can be evaluated by the OrcRowInputFormat.
|
static class |
OrcRowInputFormat.Predicate
A filter predicate that can be evaluated by the OrcRowInputFormat.
|
FileInputFormat.FileBaseStatistics, FileInputFormat.InputSplitOpenThread
currentSplit, ENUMERATE_NESTED_FILES_FLAG, enumerateNestedFiles, filePath, INFLATER_INPUT_STREAM_FACTORIES, minSplitSize, numSplits, openTimeout, READ_WHOLE_SPLIT_FLAG, splitLength, splitStart, stream, unsplittable
Constructor and Description |
---|
OrcRowInputFormat(String path,
String schemaString,
Configuration orcConfig)
Creates an OrcRowInputFormat.
|
OrcRowInputFormat(String path,
String schemaString,
Configuration orcConfig,
int batchSize)
Creates an OrcRowInputFormat.
|
OrcRowInputFormat(String path,
org.apache.orc.TypeDescription orcSchema,
Configuration orcConfig,
int batchSize)
Creates an OrcRowInputFormat.
|
Modifier and Type | Method and Description |
---|---|
void |
addPredicate(OrcRowInputFormat.Predicate predicate)
Adds a filter predicate to reduce the number of rows to be returned by the input format.
|
void |
close()
Closes the file input stream of the input format.
|
void |
closeInputFormat()
Closes this InputFormat instance.
|
TypeInformation<Row> |
getProducedType()
Gets the data type (as a
TypeInformation ) produced by this function or input format. |
Row |
nextRecord(Row reuse)
Reads the next record from the input.
|
void |
open(FileInputSplit fileSplit)
Opens an input stream to the file defined in the input format.
|
void |
openInputFormat()
Opens this InputFormat instance.
|
boolean |
reachedEnd()
Method used to check if the end of the input is reached.
|
void |
selectFields(int... selectedFields)
Selects the fields from the ORC schema that are returned by InputFormat.
|
boolean |
supportsMultiPaths()
Override this method to supports multiple paths.
|
acceptFile, configure, createInputSplits, decorateInputStream, extractFileExtension, getFilePath, getFilePaths, getFileStats, getFileStats, getInflaterInputStreamFactory, getInputSplitAssigner, getMinSplitSize, getNestedFileEnumeration, getNumSplits, getOpenTimeout, getSplitLength, getSplitStart, getStatistics, registerInflaterInputStreamFactory, setFilePath, setFilePath, setFilePaths, setFilePaths, setFilesFilter, setMinSplitSize, setNestedFileEnumeration, setNumSplits, setOpenTimeout, testForUnsplittable, toString
getRuntimeContext, setRuntimeContext
public OrcRowInputFormat(String path, String schemaString, Configuration orcConfig)
path
- The path to read ORC files from.schemaString
- The schema of the ORC files as String.orcConfig
- The configuration to read the ORC files with.public OrcRowInputFormat(String path, String schemaString, Configuration orcConfig, int batchSize)
path
- The path to read ORC files from.schemaString
- The schema of the ORC files as String.orcConfig
- The configuration to read the ORC files with.batchSize
- The number of Row objects to read in a batch.public OrcRowInputFormat(String path, org.apache.orc.TypeDescription orcSchema, Configuration orcConfig, int batchSize)
path
- The path to read ORC files from.orcSchema
- The schema of the ORC files as ORC TypeDescription.orcConfig
- The configuration to read the ORC files with.batchSize
- The number of Row objects to read in a batch.public void addPredicate(OrcRowInputFormat.Predicate predicate)
Note: Predicates can significantly reduce the amount of data that is read. However, the OrcRowInputFormat does not guarantee that all returned rows qualify the predicates. Moreover, predicates are only applied if the referenced field is among the selected fields.
predicate
- The filter predicate.public void selectFields(int... selectedFields)
selectedFields
- The indices of the fields of the ORC schema that are returned by the InputFormat.public void openInputFormat() throws IOException
RichInputFormat
openInputFormat
in class RichInputFormat<Row,FileInputSplit>
IOException
- in case allocating the resources failed.InputFormat
public void open(FileInputSplit fileSplit) throws IOException
FileInputFormat
The stream is actually opened in an asynchronous thread to make sure any interruptions to the thread working on the input format do not reach the file system.
open
in interface InputFormat<Row,FileInputSplit>
open
in class FileInputFormat<Row>
fileSplit
- The split to be opened.IOException
- Thrown, if the spit could not be opened due to an I/O problem.public void close() throws IOException
FileInputFormat
close
in interface InputFormat<Row,FileInputSplit>
close
in class FileInputFormat<Row>
IOException
- Thrown, if the input could not be closed properly.public void closeInputFormat() throws IOException
RichInputFormat
RichInputFormat.openInputFormat()
should be closed in this method.closeInputFormat
in class RichInputFormat<Row,FileInputSplit>
IOException
- in case closing the resources failedInputFormat
public boolean reachedEnd() throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
reachedEnd
in interface InputFormat<Row,FileInputSplit>
IOException
- Thrown, if an I/O error occurred.public Row nextRecord(Row reuse) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
nextRecord
in interface InputFormat<Row,FileInputSplit>
reuse
- Object that may be reused.IOException
- Thrown, if an I/O error occurred.public TypeInformation<Row> getProducedType()
ResultTypeQueryable
TypeInformation
) produced by this function or input format.getProducedType
in interface ResultTypeQueryable<Row>
public boolean supportsMultiPaths()
FileInputFormat
supportsMultiPaths
in class FileInputFormat<Row>
Copyright © 2014–2020 The Apache Software Foundation. All rights reserved.