Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-28 [2012/01/31 14:39]
straka
+++ courses:mapreduce-tutorial:step-28 [2012/02/05 19:05]
straka
@@ Line 130: / Line 130: @@
 ===== Exercise 2 =====
-Improve the solution to identify the documents by their ids instead of names, i.e., create for each word a sequence of ''DocWithOccurences<IntWritable>''. Your solution should use two Hadoop jobs:
+Optional. Improve the solution to identify the documents by their ids instead of names, i.e., create for each word a sequence of ''DocWithOccurences<IntWritable>''. Your solution should use two Hadoop jobs:
   - in the first job, create a list of unique document names. Number the documents using the order in this list.
   - in the second job, create for each word a list of ''DocWithOccurences<IntWritable>'', where the document is identified by its number (contrary to the previous exercise, where ''Text'' was used to identify the document).

Institute of Formal and Applied Linguistics Wiki