Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-28 [2012/01/31 14:39]
straka
+++ courses:mapreduce-tutorial:step-28 [2012/02/05 19:08]
straka
@@ Line 126: / Line 126: @@
   * has methods ''getDoc'', ''setDoc'', ''getOccurrences'', ''addOccurence'', ''toString''.
-Using this type, create an inverted index -- implement a Hadoop job, that for each word creates a list of ''DocWithOccurences<Text>'' containing the documents containing this word, including the occurences.
+Using this type, create an inverted index -- implement a Hadoop job, that for each word creates a list of ''DocWithOccurences<Text>'' containing the documents containing this word, including the occurrences.
 ===== Exercise 2 =====
-Improve the solution to identify the documents by their ids instead of names, i.e., create for each word a sequence of ''DocWithOccurences<IntWritable>''. Your solution should use two Hadoop jobs:
+Optional. Improve the solution to identify the documents by their ids instead of names, i.e., create for each word a sequence of ''DocWithOccurences<IntWritable>''. Your solution should use two Hadoop jobs:
   - in the first job, create a list of unique document names. Number the documents using the order in this list.
   - in the second job, create for each word a list of ''DocWithOccurences<IntWritable>'', where the document is identified by its number (contrary to the previous exercise, where ''Text'' was used to identify the document).

Institute of Formal and Applied Linguistics Wiki