Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-29 [2012/01/30 00:50]
straka
+++ courses:mapreduce-tutorial:step-29 [2012/01/30 15:47]
majlis
@@ Line 121: / Line 121: @@
 The ''ParagraphTextInputFormat'' should allow splitting of uncompressed files. Be careful to properly implement reading paragraphs which are on split boundary. The easiest way of doing so is the following:
-  * if the offset of the split is 0, start reading at the beginning of the split. If the offset of the split is larger than 0, start reading from the offset and ignore first paragraph found.
+  * if the offset of the split is 0, start reading at the beginning of the split. If the offset of the split is larger than 0, start reading at the offset and ignore first paragraph found.
-  * read all paragraphs that start
+  * read all paragraphs that start before the end of the split boundary, even if they end after the split boundary. //If a paragraph starts just after the current split (i.e., on the split boundary), read it too.//
+It is simple to verify that with these rules, all paragraphs are read exactly once.
+----
+<html>
+<table style="width:100%">
+<tr>
+<td style="text-align:left; width: 33%; "></html>[[step-28|Step 28]]: Running multiple Hadoop jobs in one class.<html></td>
+<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
+<td style="text-align:right; width: 33%; "></html>[[step-30|Step 30]]: Implementing iterative MapReduce jobs faster using All-Reduce.<html></td>
+</tr>
+</table>
+</html>

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences