====== MapReduce Tutorial ====== * Part 1: Monday January 30, 14:00-17:00, lab SU2 * Part 2: Tuesday January 31, 14:00-17:00, lab SU2 ===== Materials ===== * [[.:mapreduce-tutorial:Introduction]] ===== Overview ===== * [[.:mapreduce-tutorial:Hadoop job overview]] * [[.:mapreduce-tutorial:Managing a Hadoop cluster]] * [[.:mapreduce-tutorial:Running jobs]] * [[.:mapreduce-tutorial:Perl API]], [[http://hadoop.apache.org/common/docs/r1.0.0/api/index.html|Java API]] * [[.:mapreduce-tutorial:Making your job configurable]] * [[.:mapreduce-tutorial:If things go wrong]] ===== Day 1 ===== Today we will be using the [[.:mapreduce-tutorial:Perl API]] (there is no need to study it now, the tutorial will explain it). === Environment === * [[.:mapreduce-tutorial:Step 1]]: Setting the environment. === MapReduce basics === * [[.:mapreduce-tutorial:Step 2]]: Input and output format, testing data. * [[.:mapreduce-tutorial:Step 3]]: Basic mapper. * [[.:mapreduce-tutorial:Step 4]]: Counters. * [[.:mapreduce-tutorial:Step 5]]: Basic reducer. === Controlling the cluster === * [[.:mapreduce-tutorial:Step 6]]: Running on cluster. * [[.:mapreduce-tutorial:Step 7]]: Dynamic Hadoop cluster for several computations. === MapReduce extended === * [[.:mapreduce-tutorial:Step 8]]: Multiple mappers, reducers and partitioning. * [[.:mapreduce-tutorial:Step 9]]: Hadoop properties. * [[.:mapreduce-tutorial:Step 10]]: Combiners. * [[.:mapreduce-tutorial:Step 11]]: Initialization and cleanup of MR tasks, performance of combiners. * [[.:mapreduce-tutorial:Step 12]]: Additional output from mappers and reducers. === Advanced MapReduce exercises === Exercises in this section can be made in any order, but it is recommended to try solving all of them. The [[.:mapreduce-tutorial:Perl API|Perl API reference]] may come handy. * [[.:mapreduce-tutorial:Step 13]]: Sorting. * [[.:mapreduce-tutorial:Step 14]]: N-gram language model. * [[.:mapreduce-tutorial:Step 15]]: K-means clustering. === Beyond MapReduce === * [[.:mapreduce-tutorial:Step 16]]: Implementing iterative MapReduce jobs faster using All-Reduce. ===== Day 2 ===== Today we will be using the [[http://hadoop.apache.org/common/docs/r1.0.0/api/index.html|Java API]]. === Environment === * [[.:mapreduce-tutorial:Step 21]]: Preparing the environment. * [[.:mapreduce-tutorial:Step 22]]: Optional -- Setting Eclipse. === Java Hadoop basics ==== * [[.:mapreduce-tutorial:Step 23]]: Predefined formats and types. * [[.:mapreduce-tutorial:Step 24]]: Mappers, running Java Hadoop jobs, counters. * [[.:mapreduce-tutorial:Step 25]]: Reducers, combiners and partitioners. * [[.:mapreduce-tutorial:Step 26]]: Compression and job configuration. * [[.:mapreduce-tutorial:Step 27]]: Running multiple Hadoop jobs in one source file. === Advanced topics === * [[.:mapreduce-tutorial:Step 28]]: Custom data types. * [[.:mapreduce-tutorial:Step 29]]: Custom sorting and grouping comparators. * [[.:mapreduce-tutorial:Step 30]]: Custom input formats. === Beyond MapReduce === * [[.:mapreduce-tutorial:Step 31]]: Implementing iterative MapReduce jobs faster using All-Reduce. ===== Other ===== * [[user:majlis:hadoop|Further information]]