[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-2 [2012/01/25 00:32]
straka
courses:mapreduce-tutorial:step-2 [2012/01/25 21:49]
straka
Line 5: Line 5:
  
 ===== Input formats ===== ===== Input formats =====
-  * ''TextInputFormat'' -- values are lines of UTF8 plain text files, keys are the positions of their first character in the file +  * ''TextInputFormat'' -- values are lines of UTF8 plain text files, keys are the positions of their first character in the file. 
-  * ''KeyValueTextInputFormat'' -- every line of UTF8 plain text file is split using first TAB character, forming key and value. If there is no TAB character, value is empty +  * ''KeyValueTextInputFormat'' -- every line of UTF8 plain text file is split using first TAB character, forming key and value. If there is no TAB character, the value is empty. 
-  * ''SequenceFileInputFormat'' -- binary format+  * ''SequenceFileInputFormat'' -- binary format.
 The input format can be compressed and will be decompressed transparently by the MR framework. The input format can be compressed and will be decompressed transparently by the MR framework.
  
 ===== Output formats ===== ===== Output formats =====
-  * ''TextOutputFormat'' -- (key, value) pair is printed in UTF8 on one line separated by a TAB character. If key or value is empty, no TAB character is used. +  * ''TextOutputFormat'' -- (key, value) pair is printed using UTF8 on one line separated by a TAB character. If key or value is empty, no TAB character is used. 
-  * ''SequenceFileOutputFormat'' -- binary format+  * ''SequenceFileOutputFormat'' -- binary format.
 The output format can be compressed on demand. The output format can be compressed on demand.
  

[ Back to the navigation ] [ Back to the content ]