[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

This is an old revision of the document!


MapReduce Tutorial : Custom input formats

Every custom format reading keys of type K and values of type V must subclass InputFormat<K, V>. Usually it is easier to subclass FileInputFormat<K, V> – the file listing and splitting is then solved by the FileInputFormat itself.

WholeFileInputFormat

We start by creating WholeFileInputFormat, which reads any file and return exactly one input pair (input_path, file_content) with types (Text, BytesWritable). The format does not allow file splitting – each file will be processed by exactly one mapper.

The main functionality lays in WholeFileRecordReader, a subclass of RecordReader<Text, BytesWritable.


[ Back to the navigation ] [ Back to the content ]