@InterfaceAudience.Public @InterfaceStability.Stable public class KeyValueTextInputFormat extends FileInputFormat<Text,Text> implements JobConfigurable
InputFormat for plain text files. Files are broken into lines.
Either linefeed or carriage-return are used to signal end of line. Each line
is divided into key and value parts by a separator byte. If no such a byte
exists, the key will be the entire line and value will be empty.INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LOG, NUM_INPUT_FILES| Constructor and Description |
|---|
KeyValueTextInputFormat() |
| Modifier and Type | Method and Description |
|---|---|
void |
configure(JobConf conf)
Initializes a new instance from a
JobConf. |
RecordReader<Text,Text> |
getRecordReader(InputSplit genericSplit,
JobConf job,
Reporter reporter)
Get the
RecordReader for the given InputSplit. |
protected boolean |
isSplitable(FileSystem fs,
Path file)
Is the given filename splittable? Usually, true, but if the file is
stream compressed, it will not be.
|
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, getSplits, listStatus, makeSplit, makeSplit, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSizepublic void configure(JobConf conf)
JobConfigurableJobConf.configure in interface JobConfigurableconf - the configurationprotected boolean isSplitable(FileSystem fs, Path file)
FileInputFormatFileInputFormat always returns
true. Implementations that may deal with non-splittable files must
override this method.
FileInputFormat implementations can override this and return
false to ensure that individual input files are never split-up
so that Mappers process entire files.isSplitable in class FileInputFormat<Text,Text>fs - the file system that the file is onfile - the file name to checkpublic RecordReader<Text,Text> getRecordReader(InputSplit genericSplit, JobConf job, Reporter reporter) throws IOException
InputFormatRecordReader for the given InputSplit.
It is the responsibility of the RecordReader to respect
record boundaries while processing the logical split to present a
record-oriented view to the individual task.
getRecordReader in interface InputFormat<Text,Text>getRecordReader in class FileInputFormat<Text,Text>genericSplit - the InputSplitjob - the job that this split belongs toRecordReaderIOExceptionCopyright © 2024 Apache Software Foundation. All rights reserved.