This is an old revision of the document!
MapReduce Tutorial : Custom input formats
Every custom format reading keys of type K and values of type V must subclass InputFormat<K, V>. Usually it is easier to subclass FileInputFormat<K, V> – the file listing and splitting is then solved by the FileInputFormat itself.
WholeFileInputFormat
We start by creating WholeFileInputFormat, which reads any file and return exactly one input pair (input_path, file_content) with types (Text, BytesWritable). The format does not allow file splitting – each file will be processed by exactly one mapper.
The main functionality lays in WholeFileRecordReader, a subclass of RecordReader<Text, BytesWritable.
