Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-29 [2012/01/29 17:42]
straka
+++ courses:mapreduce-tutorial:step-29 [2012/01/29 17:44]
straka
@@ Line 9: / Line 9: @@
 When implementing new input format, we must
   * decide whether the input files are splittable. Usually uncompressed are splittable and compressed are not splittable, with the exception of ''SequenceFile'', which is always splittable.
-  * implement
+  * implement [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/RecordReader.html|RecordReader<K, V>]]. The ''RecordReader'' is the one doing the real work -- it is given a file split and it reads (key, value) pairs of types (K, V), until there are any.
-When
+Our ''FileAsPathInputFormat'' is simple -- we allow splitting of uncompressed file and the ''RecordReader'' reads exactly one input pair.
 <code java>
 public static class FileAsPathInputFormat extends FileInputFormat<Text, Text> {

Institute of Formal and Applied Linguistics Wiki