[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-29 [2012/01/30 00:50]
straka
courses:mapreduce-tutorial:step-29 [2012/01/30 15:47]
majlis
Line 121: Line 121:
  
 The ''ParagraphTextInputFormat'' should allow splitting of uncompressed files. Be careful to properly implement reading paragraphs which are on split boundary. The easiest way of doing so is the following: The ''ParagraphTextInputFormat'' should allow splitting of uncompressed files. Be careful to properly implement reading paragraphs which are on split boundary. The easiest way of doing so is the following:
-  * if the offset of the split is 0, start reading at the beginning of the split. If the offset of the split is larger than 0, start reading from the offset and ignore first paragraph found. +  * if the offset of the split is 0, start reading at the beginning of the split. If the offset of the split is larger than 0, start reading at the offset and ignore first paragraph found. 
-  * read all paragraphs that start +  * read all paragraphs that start before the end of the split boundary, even if they end after the split boundary. //If a paragraph starts just after the current split (i.e., on the split boundary), read it too.// 
 +It is simple to verify that with these rules, all paragraphs are read exactly once. 
 + 
 +---- 
 + 
 +<html> 
 +<table style="width:100%"> 
 +<tr> 
 +<td style="text-align:left; width: 33%; "></html>[[step-28|Step 28]]: Running multiple Hadoop jobs in one class.<html></td> 
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> 
 +<td style="text-align:right; width: 33%; "></html>[[step-30|Step 30]]: Implementing iterative MapReduce jobs faster using All-Reduce.<html></td> 
 +</tr> 
 +</table> 
 +</html>

[ Back to the navigation ] [ Back to the content ]