[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
courses:mapreduce:introduction [2012/01/13 11:13]
majlis odstraněno
— (current)
Line 1: Line 1:
-====== Large data processing using MapReduce ====== 
  
- 
-For an introduction, it is best to read the [[http://fox.auryn.cz/mr/original_paper_dean04.pdf|original paper]]. 
-There are also Czech [[http://fox.auryn.cz/mr/slides_czech_2009.pdf|slides]] (up to slide 45). 
- 
-There are nice slides from the three-day course available at [[http://sites.google.com/site/mriap2008/lectures]]. 
-I would suggest to start with http://sites.google.com/site/mriap2008/intro_to_mapreduce.pdf . 
- 
-Now is good time to solve the following exercises: 
-  * create a list of unique words present in a given text 
-  * count all bigrams present in a given text 
-  * count all n-grams for all n <= N in a given text 
-  * with what probability is a word capitalized 
-  * given a large corpus, find all undiacritized forms of words present in the corpus and for every such form, compute the most probable diacritization 
-  * create an index: given many URL + their text, create for each word  a list of URLs whose text contain this word. For each such URL, produce an ascending list of positions of this word in the document. 
-  * implement iterative k-means algorithm 
- 
-The following slides discuss solutions to various problems using MR: 
-  * http://sites.google.com/site/mriap2008/what_is_mapreduce.pdf 
-  * http://sites.google.com/site/mriap2008/word_context_enthropy.pdf 
-  * http://sites.google.com/site/mriap2008/hadoop_and_k_means.pdf pages 23-30 
-  * http://sites.google.com/site/mriap2008/not_everything_is_nail.pdf (problems difficult for MR) 
- 
-There is also a paper about implementing various machine learning algorithms (SVM, EM, Bayes, etc.) using MapReduce on multicore, which is applicable also for distributed computations: [[http://fox.auryn.cz/mr/machine_learning_using_mr_nips06.pdf]]. 

[ Back to the navigation ] [ Back to the content ]