[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-29 [2012/01/29 17:40]
straka
courses:mapreduce-tutorial:step-29 [2012/01/29 17:42]
straka
Line 7: Line 7:
 We start by creating ''FileAsPathInputFormat'', which reads any file, splits it and for each split return exactly one input pair (file_path, start-length) with types (''Text'', ''Text''), where ''file_path'' is path to the file and ''start-length'' is a string containing two dash-separated numbers: start offset of the split and length of the split. We start by creating ''FileAsPathInputFormat'', which reads any file, splits it and for each split return exactly one input pair (file_path, start-length) with types (''Text'', ''Text''), where ''file_path'' is path to the file and ''start-length'' is a string containing two dash-separated numbers: start offset of the split and length of the split.
  
 +When implementing new input format, we must
 +  * decide whether the input files are splittable. Usually uncompressed are splittable and compressed are not splittable, with the exception of ''SequenceFile'', which is always splittable.
 +  * implement
 When  When 
  

[ Back to the navigation ] [ Back to the content ]