Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
courses:mapreduce-tutorial:step-27 [2012/01/28 20:02] straka |
courses:mapreduce-tutorial:step-27 [2012/01/28 20:14] straka |
||
---|---|---|---|
Line 117: | Line 117: | ||
===== Exercise ===== | ===== Exercise ===== | ||
+ | |||
+ | Imagine you want to create an inverted index. In the index, for each word and document containing the word, all positions of the word in the document have to be stored. | ||
+ | |||
+ | Create a type '' | ||
+ | * stores a document of type '' | ||
+ | * stores a list of positions of occurrence. The sequence of length //N// should be stored on disk as number //N// followed by //N// numbers -- positions of occurrence. Type '' | ||
+ | * is comparable, comparing using the '' | ||
+ | * has methods '' | ||
+ | |||
+ | Using this type, create an inverted index -- implement a Hadoop job, that for each word creates a //sorted// list of '' |