@Internal public class BlockSplittingRecursiveAllDirEnumerator extends BlockSplittingRecursiveEnumerator
FileEnumerator
enumerates all files under the given paths recursively except the
hidden directories, and creates a separate split for each file block.
Please note that file blocks are only exposed by some file systems, such as HDFS. File systems that do not expose block information will not create multiple file splits per file, but keep the files as one source split.
Files with suffixes corresponding to known compression formats (for example '.gzip', '.bz2',
...) will not be split. See StandardDeCompressors
for a list of known formats and
suffixes.
Compared to BlockSplittingRecursiveEnumerator
, this enumerator will enumerate all
files even through its parent directory is filtered out by the file filter.
FileEnumerator.Provider
fileFilter
Constructor and Description |
---|
BlockSplittingRecursiveAllDirEnumerator(java.util.function.Predicate<Path> fileFilter,
String[] nonSplittableFileSuffixes)
Creates a new enumerator that uses the given predicate as a filter for file paths, and avoids
splitting files with the given extension (typically to avoid splitting compressed files).
|
BlockSplittingRecursiveAllDirEnumerator(String pathPattern)
Creates a new enumerator that enumerates all files whose file path matches the regex except
hidden files.
|
Modifier and Type | Method and Description |
---|---|
protected void |
addSplitsForPath(FileStatus fileStatus,
FileSystem fs,
ArrayList<FileSourceSplit> target) |
convertToSourceSplits, isFileSplittable
enumerateSplits, getNextId
public BlockSplittingRecursiveAllDirEnumerator(String pathPattern)
The enumerator does not split files that have a suffix corresponding to a known
compression format (for example '.gzip', '.bz2', '.xy', '.zip', ...). See StandardDeCompressors
for details.
public BlockSplittingRecursiveAllDirEnumerator(java.util.function.Predicate<Path> fileFilter, String[] nonSplittableFileSuffixes)
protected void addSplitsForPath(FileStatus fileStatus, FileSystem fs, ArrayList<FileSourceSplit> target) throws IOException
addSplitsForPath
in class NonSplittingRecursiveEnumerator
IOException
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.