Differences
This shows you the differences between two versions of the page.
Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
courses:mapreduce-tutorial:step-29 [2012/01/29 17:23] straka |
courses:mapreduce-tutorial:step-29 [2012/01/31 14:41] straka |
====== MapReduce Tutorial : Custom input formats ====== | ====== MapReduce Tutorial : Custom sorting and grouping comparators. ====== |
| |
Every custom format reading keys of type ''K'' and values of type ''V'' must subclass [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/InputFormat.html|InputFormat<K, V>]]. Usually it is easier to subclass [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html|FileInputFormat<K, V>]] -- the file listing and splitting is then solved by the ''FileInputFormat'' itself. | |
| |
===== WholeFileInputFormat ===== | ---- |
| |
We start by creating ''WholeFileInputFormat'', which reads any file and return exactly one input pair (input_path, file_content) with types (''Text'', ''BytesWritable''). The format does not allow file splitting -- each file will be processed by exactly one mapper. | |
| |
The main functionality lays in ''WholeFileRecordReader'', a subclass of [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/RecordReader.html|RecordReader<Text, BytesWritable]]. | |
| |
| <html> |
| <table style="width:100%"> |
| <tr> |
| <td style="text-align:left; width: 33%; "></html>[[step-28|Step 28]]: Custom data types.<html></td> |
| <td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> |
| <td style="text-align:right; width: 33%; "></html>[[step-30|Step 30]]: Custom input formats.<html></td> |
| </tr> |
| </table> |
| </html> |