Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
courses:mapreduce-tutorial:step-14 [2012/01/25 22:19] straka |
courses:mapreduce-tutorial:step-14 [2012/01/25 23:15] straka |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== MapReduce Tutorial : Exercise - N-gram language model ====== | ====== MapReduce Tutorial : Exercise - N-gram language model ====== | ||
+ | |||
+ | For a given //N// create a simple N-gram language model. You can experimenting on the following data: | ||
+ | ^ Path ^ Size ^ | ||
+ | | / | ||
+ | | / | ||
+ | | / | ||
+ | |||
+ | Your model should contain all the unigrams, bigrams, ..., //N//-grams with the number of occurrences in the given corpus. | ||
+ | |||
+ | As the size of the resulting corpus matters, you should represent the //N//-grams efficiently. Try using the following representation: | ||
+ | * Find the unique words of the corpus, sort them according to the number of their occurences | ||
+ |