Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-27 [2012/01/28 20:02]
straka
+++ courses:mapreduce-tutorial:step-27 [2012/01/31 13:13]
straka
@@ Line 24: / Line 24: @@
   public void write(DataOutput out) throws IOException {
-    int highest_pos = 28;
+    int mask_shift = 28;
-    while (highest_pos > 0 && (value & (0x7F << highest_pos)) == 0) highest_pos -= 7;
+    while (mask_shift > 0 && (value & (0x7F << mask_shift)) == 0) mask_shift -= 7;
-    while (highest_pos > 0) {
+    while (mask_shift > 0) {
-      out.writeByte(0x80 | ((value >> highest_pos) & 0x7F));
+      out.writeByte(0x80 | ((value >> mask_shift) & 0x7F));
-      highest_pos -= 7;
+      mask_shift -= 7;
     }
     out.writeByte(value & 0x7F);
@@ Line 117: / Line 117: @@
 ===== Exercise =====
+Imagine you want to create an inverted index. In the index, for each word and document containing the word, all positions of the word in the document have to be stored.
+Create a type ''DocWithOccurences<Doctype extends WritableComparable>'' implementing ''WritableComparable''. The type:
+  * stores a document of type ''Doctype''.
+  * stores a list of positions of occurrence. The sequence of length //N// should be stored on disk as number //N// followed by //N// numbers -- positions of occurrence. Type ''BERIntWritable'' should be used.
+  * is comparable, comparing using the ''Comparable'' interface od ''Doctype''.
+  * has methods ''getDoc'', ''setDoc'', ''getOccurrences'', ''addOccurence'', ''toString''.
+Using this type, create an inverted index -- implement a Hadoop job, that for each word creates a //sorted// list of ''DocWithOccurences<Text>'' containing the documents containing this word, including the occurences.
+----
+<html>
+<table style="width:100%">
+<tr>
+<td style="text-align:left; width: 33%; "></html>[[step-26|Step 26]]: Compression and job configuration.<html></td>
+<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
+<td style="text-align:right; width: 33%; "></html>[[step-28|Step 28]]: Running multiple Hadoop jobs in one source file.<html></td>
+</tr>
+</table>
+</html>

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences