[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki

[ Back to the navigation ]


This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
courses:mapreduce-tutorial [2012/01/25 15:40]
courses:mapreduce-tutorial [2012/02/05 20:01] (current)
Line 6: Line 6:
   * [[.:mapreduce-tutorial:Introduction]]   * [[.:mapreduce-tutorial:Introduction]]
 +===== Overview =====
 +  * [[.:mapreduce-tutorial:Hadoop job overview]]
 +  * [[.:mapreduce-tutorial:Managing a Hadoop cluster]]
 +  * [[.:mapreduce-tutorial:Running jobs]]
 +  * [[.:mapreduce-tutorial:Perl API]], [[http://hadoop.apache.org/common/docs/r1.0.0/api/index.html|Java API]]
 +  * [[.:mapreduce-tutorial:Making your job configurable]]
 +  * [[.:mapreduce-tutorial:If things go wrong]]
 ===== Day 1 ===== ===== Day 1 =====
Line 21: Line 28:
   * [[.:mapreduce-tutorial:Step 6]]: Running on cluster.   * [[.:mapreduce-tutorial:Step 6]]: Running on cluster.
   * [[.:mapreduce-tutorial:Step 7]]: Dynamic Hadoop cluster for several computations.   * [[.:mapreduce-tutorial:Step 7]]: Dynamic Hadoop cluster for several computations.
-**From now on, run all examples using a one-machine cluster. Running the scripts locally without any cluster has several disadvantages, most notably having only one reducer per job.** 
 === MapReduce extended === === MapReduce extended ===
Line 28: Line 33:
   * [[.:mapreduce-tutorial:Step 9]]: Hadoop properties.   * [[.:mapreduce-tutorial:Step 9]]: Hadoop properties.
   * [[.:mapreduce-tutorial:Step 10]]: Combiners.   * [[.:mapreduce-tutorial:Step 10]]: Combiners.
-  * [[.:mapreduce-tutorial:Step 11]]: Initialization and cleanup of MR tasks. +  * [[.:mapreduce-tutorial:Step 11]]: Initialization and cleanup of MR tasks, performance of combiners
-  * [[.:mapreduce-tutorial:Step 12]]: Reducers +  * [[.:mapreduce-tutorial:Step 12]]: Additional output from mappers and reducers.
-  *  Initialization and cleanup of MR tasks. +
-  * Work dir.+
 === Advanced MapReduce exercises === === Advanced MapReduce exercises ===
-  sorting +Exercises in this section can be made in any order, but it is recommended to try solving all of them. The [[.:mapreduce-tutorial:Perl API|Perl API reference]] may come handy. 
-  * N-grams with indexes +  [[.:mapreduce-tutorial:Step 13]]: Sorting. 
-  * K-means+  * [[.:mapreduce-tutorial:Step 14]]: N-gram language model. 
 +  * [[.:mapreduce-tutorial:Step 15]]: K-means clustering. 
 +=== Beyond MapReduce === 
 +  * [[.:mapreduce-tutorial:Step 16]]: Implementing iterative MapReduce jobs faster using All-Reduce. 
 +===== Day 2 ===== 
 +Today we will be using the [[http://hadoop.apache.org/common/docs/r1.0.0/api/index.html|Java API]]. 
 +=== Environment === 
 +  * [[.:mapreduce-tutorial:Step 21]]: Preparing the environment. 
 +  * [[.:mapreduce-tutorial:Step 22]]: Optional -- Setting Eclipse. 
 +=== Java Hadoop basics ==== 
 +  * [[.:mapreduce-tutorial:Step 23]]: Predefined formats and types. 
 +  * [[.:mapreduce-tutorial:Step 24]]: Mappers, running Java Hadoop jobs, counters. 
 +  * [[.:mapreduce-tutorial:Step 25]]: Reducers, combiners and partitioners. 
 +  * [[.:mapreduce-tutorial:Step 26]]: Compression and job configuration. 
 +  * [[.:mapreduce-tutorial:Step 27]]: Running multiple Hadoop jobs in one source file. 
 +=== Advanced topics === 
 +  * [[.:mapreduce-tutorial:Step 28]]: Custom data types. 
 +  * [[.:mapreduce-tutorial:Step 29]]: Custom sorting and grouping comparators. 
 +  * [[.:mapreduce-tutorial:Step 30]]: Custom input formats. 
 +=== Beyond MapReduce === 
 +  * [[.:mapreduce-tutorial:Step 31]]: Implementing iterative MapReduce jobs faster using All-Reduce.
 ===== Other ===== ===== Other =====
   * [[user:majlis:hadoop|Further information]]   * [[user:majlis:hadoop|Further information]]

[ Back to the navigation ] [ Back to the content ]