This is an old revision of the document!
Table of Contents
MapReduce Tutorial
- Part 1: Monday January 30, 14:00-17:00, lab SU2
- Part 2: Tuesday January 31, 14:00-17:00, lab SU2
Materials
Day 1
Today we will be using the Perl API (there is no need to study it now, the tutorial will explain it).
Environment
- Step 1: Setting the environment.
MapReduce basics
Controlling the cluster
From now on, run all examples using a one-machine cluster. Running the scripts locally without any cluster has several disadvantages, most notably having only one reducer per job.
MapReduce extended
- Step 8: Multiple mappers, reducers and partitioning.
Hadoop properties
- sorting
Combiners
setup, cleanup, perl inplace
Work dir
N-grams
K-means and Iterations