[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
courses:mapreduce-tutorial [2012/01/23 21:36]
straka
courses:mapreduce-tutorial [2012/02/05 20:01]
straka
Line 6: Line 6:
   * [[.:mapreduce-tutorial:Introduction]]   * [[.:mapreduce-tutorial:Introduction]]
  
 +===== Overview =====
 +  * [[.:mapreduce-tutorial:Hadoop job overview]]
 +  * [[.:mapreduce-tutorial:Managing a Hadoop cluster]]
 +  * [[.:mapreduce-tutorial:Running jobs]]
 +  * [[.:mapreduce-tutorial:Perl API]], [[.http://hadoop.apache.org/common/docs/r1.0.0/api/index.html|Java API]]
 +  * [[.:mapreduce-tutorial:Making your job configurable]]
 +  * [[.:mapreduce-tutorial:If things go wrong]]
  
 ===== Day 1 ===== ===== Day 1 =====
 Today we will be using the [[.:mapreduce-tutorial:Perl API]] (there is no need to study it now, the tutorial will explain it). Today we will be using the [[.:mapreduce-tutorial:Perl API]] (there is no need to study it now, the tutorial will explain it).
 +=== Environment ===
   * [[.:mapreduce-tutorial:Step 1]]: Setting the environment.   * [[.:mapreduce-tutorial:Step 1]]: Setting the environment.
 +
 +=== MapReduce basics ===
 +  * [[.:mapreduce-tutorial:Step 2]]: Input and output format, testing data.
 +  * [[.:mapreduce-tutorial:Step 3]]: Basic mapper.
 +  * [[.:mapreduce-tutorial:Step 4]]: Counters.
 +  * [[.:mapreduce-tutorial:Step 5]]: Basic reducer.
 +
 +=== Controlling the cluster ===
 +  * [[.:mapreduce-tutorial:Step 6]]: Running on cluster.
 +  * [[.:mapreduce-tutorial:Step 7]]: Dynamic Hadoop cluster for several computations.
 +
 +=== MapReduce extended ===
 +  * [[.:mapreduce-tutorial:Step 8]]: Multiple mappers, reducers and partitioning.
 +  * [[.:mapreduce-tutorial:Step 9]]: Hadoop properties.
 +  * [[.:mapreduce-tutorial:Step 10]]: Combiners.
 +  * [[.:mapreduce-tutorial:Step 11]]: Initialization and cleanup of MR tasks, performance of combiners.
 +  * [[.:mapreduce-tutorial:Step 12]]: Additional output from mappers and reducers.
 +
 +=== Advanced MapReduce exercises ===
 +Exercises in this section can be made in any order, but it is recommended to try solving all of them. The [[.:mapreduce-tutorial:Perl API|Perl API reference]] may come handy.
 +  * [[.:mapreduce-tutorial:Step 13]]: Sorting.
 +  * [[.:mapreduce-tutorial:Step 14]]: N-gram language model.
 +  * [[.:mapreduce-tutorial:Step 15]]: K-means clustering.
 +
 +=== Beyond MapReduce ===
 +  * [[.:mapreduce-tutorial:Step 16]]: Implementing iterative MapReduce jobs faster using All-Reduce.
 +
 +===== Day 2 =====
 +
 +Today we will be using the [[http://hadoop.apache.org/common/docs/r1.0.0/api/index.html|Java API]].
 +
 +=== Environment ===
 +  * [[.:mapreduce-tutorial:Step 21]]: Preparing the environment.
 +  * [[.:mapreduce-tutorial:Step 22]]: Optional -- Setting Eclipse.
 +
 +=== Java Hadoop basics ====
 +  * [[.:mapreduce-tutorial:Step 23]]: Predefined formats and types.
 +  * [[.:mapreduce-tutorial:Step 24]]: Mappers, running Java Hadoop jobs, counters.
 +  * [[.:mapreduce-tutorial:Step 25]]: Reducers, combiners and partitioners.
 +  * [[.:mapreduce-tutorial:Step 26]]: Compression and job configuration.
 +  * [[.:mapreduce-tutorial:Step 27]]: Running multiple Hadoop jobs in one source file.
 +
 +=== Advanced topics ===
 +  * [[.:mapreduce-tutorial:Step 28]]: Custom data types.
 +  * [[.:mapreduce-tutorial:Step 29]]: Custom sorting and grouping comparators.
 +  * [[.:mapreduce-tutorial:Step 30]]: Custom input formats.
 +
 +=== Beyond MapReduce ===
 +  * [[.:mapreduce-tutorial:Step 31]]: Implementing iterative MapReduce jobs faster using All-Reduce.
  
 ===== Other ===== ===== Other =====
   * [[user:majlis:hadoop|Further information]]   * [[user:majlis:hadoop|Further information]]
  

[ Back to the navigation ] [ Back to the content ]