[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-29 [2012/01/29 17:42]
straka
courses:mapreduce-tutorial:step-29 [2012/01/29 17:44]
straka
Line 9: Line 9:
 When implementing new input format, we must When implementing new input format, we must
   * decide whether the input files are splittable. Usually uncompressed are splittable and compressed are not splittable, with the exception of ''SequenceFile'', which is always splittable.   * decide whether the input files are splittable. Usually uncompressed are splittable and compressed are not splittable, with the exception of ''SequenceFile'', which is always splittable.
-  * implement +  * implement [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/RecordReader.html|RecordReader<K, V>]]. The ''RecordReader'' is the one doing the real work -- it is given a file split and it reads (key, value) pairs of types (K, V), until there are any.
-When +
  
 +Our ''FileAsPathInputFormat'' is simple -- we allow splitting of uncompressed file and the ''RecordReader'' reads exactly one input pair.
 <code java> <code java>
 public static class FileAsPathInputFormat extends FileInputFormat<Text, Text> { public static class FileAsPathInputFormat extends FileInputFormat<Text, Text> {

[ Back to the navigation ] [ Back to the content ]