This is an old revision of the document!
MapReduce Tutorial : Custom input formats
Every custom format reading keys of type K
and values of type V
must subclass InputFormat<K, V>. Usually it is easier to subclass FileInputFormat<K, V> – the file listing and splitting is then solved by the FileInputFormat
itself.
WholeFileInputFormat
We start by creating WholeFileInputFormat
, which reads any file and return exactly one input pair (input_path, file_content) with types (Text
, BytesWritable
). The format does not allow file splitting – each file will be processed by exactly one mapper.
The main functionality lays in WholeFileRecordReader
, a subclass of RecordReader<Text, BytesWritable.