[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki

[ Back to the navigation ]

Large data processing using MapReduce

For an introduction, it is best to read the original paper.
There are also Czech slides (up to slide 45).

There are nice slides from the three-day course available at http://sites.google.com/site/mriap2008/lectures.
I would suggest to start with http://sites.google.com/site/mriap2008/intro_to_mapreduce.pdf .

Now is good time to solve the following exercises:

The following slides discuss solutions to various problems using MR:

There is also a paper about implementing various machine learning algorithms (SVM, EM, Bayes, etc.) using MapReduce on multicore, which is applicable also for distributed computations: http://fox.auryn.cz/mr/machine_learning_using_mr_nips06.pdf.

Code Template

There is also available code template for installing and executing hadoop on ufal workstations.

wget 'http://ufallab.ms.mff.cuni.cz/~majlis/mapreduce-tutorial.tar.gz'
tar -xzf mapreduce-tutorial.tar.gz
cd mapreduce-tutorial

This template is tested in UFAL environment. If you plan to use this code outside, then you should fix the first two lines in the Makefile. It also contains Eclipse project with set up paths.

[ Back to the navigation ] [ Back to the content ]