Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
courses:mapreduce-tutorial:step-27 [2012/01/28 20:02] straka |
courses:mapreduce-tutorial:step-27 [2012/01/31 13:13] straka |
||
---|---|---|---|
Line 24: | Line 24: | ||
public void write(DataOutput out) throws IOException { | public void write(DataOutput out) throws IOException { | ||
- | int highest_pos | + | int mask_shift |
- | while (highest_pos | + | while (mask_shift |
- | while (highest_pos | + | while (mask_shift |
- | out.writeByte(0x80 | ((value >> | + | out.writeByte(0x80 | ((value >> |
- | | + | |
} | } | ||
out.writeByte(value & 0x7F); | out.writeByte(value & 0x7F); | ||
Line 117: | Line 117: | ||
===== Exercise ===== | ===== Exercise ===== | ||
+ | |||
+ | Imagine you want to create an inverted index. In the index, for each word and document containing the word, all positions of the word in the document have to be stored. | ||
+ | |||
+ | Create a type '' | ||
+ | * stores a document of type '' | ||
+ | * stores a list of positions of occurrence. The sequence of length //N// should be stored on disk as number //N// followed by //N// numbers -- positions of occurrence. Type '' | ||
+ | * is comparable, comparing using the '' | ||
+ | * has methods '' | ||
+ | |||
+ | Using this type, create an inverted index -- implement a Hadoop job, that for each word creates a //sorted// list of '' | ||
+ | |||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | < | ||
+ | <table style=" | ||
+ | <tr> | ||
+ | <td style=" | ||
+ | <td style=" | ||
+ | <td style=" | ||
+ | </tr> | ||
+ | </ | ||
+ | </ |