[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
courses:mapreduce-tutorial [2012/01/25 15:44]
straka
courses:mapreduce-tutorial [2012/02/05 20:01]
straka
Line 6: Line 6:
   * [[.:mapreduce-tutorial:Introduction]]   * [[.:mapreduce-tutorial:Introduction]]
  
 +===== Overview =====
 +  * [[.:mapreduce-tutorial:Hadoop job overview]]
 +  * [[.:mapreduce-tutorial:Managing a Hadoop cluster]]
 +  * [[.:mapreduce-tutorial:Running jobs]]
 +  * [[.:mapreduce-tutorial:Perl API]], [[.http://hadoop.apache.org/common/docs/r1.0.0/api/index.html|Java API]]
 +  * [[.:mapreduce-tutorial:Making your job configurable]]
 +  * [[.:mapreduce-tutorial:If things go wrong]]
  
 ===== Day 1 ===== ===== Day 1 =====
Line 21: Line 28:
   * [[.:mapreduce-tutorial:Step 6]]: Running on cluster.   * [[.:mapreduce-tutorial:Step 6]]: Running on cluster.
   * [[.:mapreduce-tutorial:Step 7]]: Dynamic Hadoop cluster for several computations.   * [[.:mapreduce-tutorial:Step 7]]: Dynamic Hadoop cluster for several computations.
- 
-**From now on, run all examples using a one-machine cluster. Running the scripts locally without any cluster has several disadvantages, most notably having only one reducer per job.** 
  
 === MapReduce extended === === MapReduce extended ===
   * [[.:mapreduce-tutorial:Step 8]]: Multiple mappers, reducers and partitioning.   * [[.:mapreduce-tutorial:Step 8]]: Multiple mappers, reducers and partitioning.
   * [[.:mapreduce-tutorial:Step 9]]: Hadoop properties.   * [[.:mapreduce-tutorial:Step 9]]: Hadoop properties.
-  * [[.:mapreduce-tutorial:Step 10]]: Properties of reducers, combiners+  * [[.:mapreduce-tutorial:Step 10]]: Combiners
-  * [[.:mapreduce-tutorial:Step 11]]: Initialization and cleanup of MR tasks.+  * [[.:mapreduce-tutorial:Step 11]]: Initialization and cleanup of MR tasks, performance of combiners.
   * [[.:mapreduce-tutorial:Step 12]]: Additional output from mappers and reducers.   * [[.:mapreduce-tutorial:Step 12]]: Additional output from mappers and reducers.
  
 === Advanced MapReduce exercises === === Advanced MapReduce exercises ===
-  * [[.:mapreduce-tutorial:Step 13]]: Sorting +Exercises in this section can be made in any order, but it is recommended to try solving all of them. The [[.:mapreduce-tutorial:Perl API|Perl API reference]] may come handy. 
-  * [[.:mapreduce-tutorial:Step 14]]: N-gram language model +  * [[.:mapreduce-tutorial:Step 13]]: Sorting. 
-  * [[.:mapreduce-tutorial:Step 15]]: K-means algorithm+  * [[.:mapreduce-tutorial:Step 14]]: N-gram language model. 
 +  * [[.:mapreduce-tutorial:Step 15]]: K-means clustering. 
 + 
 +=== Beyond MapReduce === 
 +  * [[.:mapreduce-tutorial:Step 16]]: Implementing iterative MapReduce jobs faster using All-Reduce. 
 + 
 +===== Day 2 ===== 
 + 
 +Today we will be using the [[http://hadoop.apache.org/common/docs/r1.0.0/api/index.html|Java API]]. 
 + 
 +=== Environment === 
 +  * [[.:mapreduce-tutorial:Step 21]]: Preparing the environment. 
 +  * [[.:mapreduce-tutorial:Step 22]]: Optional -- Setting Eclipse. 
 + 
 +=== Java Hadoop basics ==== 
 +  * [[.:mapreduce-tutorial:Step 23]]: Predefined formats and types. 
 +  * [[.:mapreduce-tutorial:Step 24]]: Mappers, running Java Hadoop jobs, counters. 
 +  * [[.:mapreduce-tutorial:Step 25]]: Reducers, combiners and partitioners. 
 +  * [[.:mapreduce-tutorial:Step 26]]: Compression and job configuration. 
 +  * [[.:mapreduce-tutorial:Step 27]]: Running multiple Hadoop jobs in one source file. 
 + 
 +=== Advanced topics === 
 +  * [[.:mapreduce-tutorial:Step 28]]: Custom data types. 
 +  * [[.:mapreduce-tutorial:Step 29]]: Custom sorting and grouping comparators. 
 +  * [[.:mapreduce-tutorial:Step 30]]: Custom input formats. 
 + 
 +=== Beyond MapReduce === 
 +  * [[.:mapreduce-tutorial:Step 31]]: Implementing iterative MapReduce jobs faster using All-Reduce.
  
 ===== Other ===== ===== Other =====
   * [[user:majlis:hadoop|Further information]]   * [[user:majlis:hadoop|Further information]]
  

[ Back to the navigation ] [ Back to the content ]