[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-28 [2012/01/31 14:39]
straka
courses:mapreduce-tutorial:step-28 [2012/02/05 19:05]
straka
Line 130: Line 130:
 ===== Exercise 2 ===== ===== Exercise 2 =====
  
-Improve the solution to identify the documents by their ids instead of names, i.e., create for each word a sequence of ''DocWithOccurences<IntWritable>''. Your solution should use two Hadoop jobs:+Optional. Improve the solution to identify the documents by their ids instead of names, i.e., create for each word a sequence of ''DocWithOccurences<IntWritable>''. Your solution should use two Hadoop jobs:
   - in the first job, create a list of unique document names. Number the documents using the order in this list.   - in the first job, create a list of unique document names. Number the documents using the order in this list.
   - in the second job, create for each word a list of ''DocWithOccurences<IntWritable>'', where the document is identified by its number (contrary to the previous exercise, where ''Text'' was used to identify the document).   - in the second job, create for each word a list of ''DocWithOccurences<IntWritable>'', where the document is identified by its number (contrary to the previous exercise, where ''Text'' was used to identify the document).

[ Back to the navigation ] [ Back to the content ]