Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
courses:mapreduce:introduction [2012/01/13 11:13] majlis odstraněno |
— (current) | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Large data processing using MapReduce ====== | ||
- | |||
- | For an introduction, | ||
- | There are also Czech [[http:// | ||
- | |||
- | There are nice slides from the three-day course available at [[http:// | ||
- | I would suggest to start with http:// | ||
- | |||
- | Now is good time to solve the following exercises: | ||
- | * create a list of unique words present in a given text | ||
- | * count all bigrams present in a given text | ||
- | * count all n-grams for all n <= N in a given text | ||
- | * with what probability is a word capitalized | ||
- | * given a large corpus, find all undiacritized forms of words present in the corpus and for every such form, compute the most probable diacritization | ||
- | * create an index: given many URL + their text, create for each word a list of URLs whose text contain this word. For each such URL, produce an ascending list of positions of this word in the document. | ||
- | * implement iterative k-means algorithm | ||
- | |||
- | The following slides discuss solutions to various problems using MR: | ||
- | * http:// | ||
- | * http:// | ||
- | * http:// | ||
- | * http:// | ||
- | |||
- | There is also a paper about implementing various machine learning algorithms (SVM, EM, Bayes, etc.) using MapReduce on multicore, which is applicable also for distributed computations: |