This is an old revision of the document!
Table of Contents
MapReduce Tutorial
- Part 1: Monday January 30, 14:00-17:00, lab SU2
- Part 2: Tuesday January 31, 14:00-17:00, lab SU2
Materials
Day 1
Today we will be using the Perl API (there is no need to study it now, the tutorial will explain it).
Environment
- Step 1: Setting the environment.
MapReduce basics
Controlling the cluster
MapReduce extended
From now on, it is best to run MR jobs using a one-machine cluster. Running the scripts locally without any cluster has several disadvantages, most notably having only one reducer per job.
Advanced MapReduce exercises
Exercises in this section can be made in any order, but it is recommended to try solving all of them. The Perl API reference may come handy.
Day 2
Today we will be using the Java API.
Environment
Java Hadoop basics
Exercises
Advanced topics
- Custom input format – WholeFile and WholeFileAsPath
- Custom data type – Pair<A, B>