Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
courses:mapreduce-tutorial:step-28 [2012/01/31 14:39] straka |
courses:mapreduce-tutorial:step-28 [2012/02/05 19:10] (current) straka |
||
---|---|---|---|
Line 120: | Line 120: | ||
Imagine you want to create an inverted index. In the index, for each word and document containing the word, all positions of the word in the document have to be stored. | Imagine you want to create an inverted index. In the index, for each word and document containing the word, all positions of the word in the document have to be stored. | ||
- | Create a type '' | + | Create a type '' |
* stores a document of type '' | * stores a document of type '' | ||
* stores a list of positions of occurrence. The sequence of length //N// should be stored on disk as number //N// followed by //N// numbers -- positions of occurrence. Type '' | * stores a list of positions of occurrence. The sequence of length //N// should be stored on disk as number //N// followed by //N// numbers -- positions of occurrence. Type '' | ||
* is comparable, comparing using the '' | * is comparable, comparing using the '' | ||
- | * has methods '' | + | * has methods '' |
- | Using this type, create an inverted index -- implement a Hadoop job, that for each word creates a list of '' | + | Using this type, create an inverted index -- implement a Hadoop job, that for each word creates a list of '' |
===== Exercise 2 ===== | ===== Exercise 2 ===== | ||
- | Improve the solution to identify the documents by their ids instead of names, i.e., create for each word a sequence of '' | + | Optional. |
- in the first job, create a list of unique document names. Number the documents using the order in this list. | - in the first job, create a list of unique document names. Number the documents using the order in this list. | ||
- | - in the second job, create for each word a list of '' | + | - in the second job, create for each word a list of '' |
---- | ---- | ||
Line 145: | Line 145: | ||
</ | </ | ||
</ | </ | ||
+ |