Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision | |||
|
courses:mapreduce:introduction [2012/01/13 11:13] majlis odstraněno |
— (current) | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== Large data processing using MapReduce ====== | ||
| - | |||
| - | For an introduction, | ||
| - | There are also Czech [[http:// | ||
| - | |||
| - | There are nice slides from the three-day course available at [[http:// | ||
| - | I would suggest to start with http:// | ||
| - | |||
| - | Now is good time to solve the following exercises: | ||
| - | * create a list of unique words present in a given text | ||
| - | * count all bigrams present in a given text | ||
| - | * count all n-grams for all n <= N in a given text | ||
| - | * with what probability is a word capitalized | ||
| - | * given a large corpus, find all undiacritized forms of words present in the corpus and for every such form, compute the most probable diacritization | ||
| - | * create an index: given many URL + their text, create for each word a list of URLs whose text contain this word. For each such URL, produce an ascending list of positions of this word in the document. | ||
| - | * implement iterative k-means algorithm | ||
| - | |||
| - | The following slides discuss solutions to various problems using MR: | ||
| - | * http:// | ||
| - | * http:// | ||
| - | * http:// | ||
| - | * http:// | ||
| - | |||
| - | There is also a paper about implementing various machine learning algorithms (SVM, EM, Bayes, etc.) using MapReduce on multicore, which is applicable also for distributed computations: | ||
