[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial [2012/01/13 11:13]
majlis
courses:mapreduce-tutorial [2012/01/25 15:46]
straka
Line 7: Line 7:
  
  
-===== Other ====+===== Day 1 ===== 
 +Today we will be using the [[.:mapreduce-tutorial:Perl API]] (there is no need to study it now, the tutorial will explain it). 
 +=== Environment === 
 +  * [[.:mapreduce-tutorial:Step 1]]: Setting the environment. 
 + 
 +=== MapReduce basics === 
 +  * [[.:mapreduce-tutorial:Step 2]]: Input and output format, testing data. 
 +  * [[.:mapreduce-tutorial:Step 3]]: Basic mapper. 
 +  * [[.:mapreduce-tutorial:Step 4]]: Counters. 
 +  * [[.:mapreduce-tutorial:Step 5]]: Basic reducer. 
 + 
 +=== Controlling the cluster === 
 +  * [[.:mapreduce-tutorial:Step 6]]: Running on cluster. 
 +  * [[.:mapreduce-tutorial:Step 7]]: Dynamic Hadoop cluster for several computations. 
 + 
 +From now on, it is best to run MR jobs using a one-machine cluster. Running the scripts locally without any cluster has several disadvantages, most notably having only one reducer per job. 
 + 
 +=== MapReduce extended === 
 +  * [[.:mapreduce-tutorial:Step 8]]: Multiple mappers, reducers and partitioning. 
 +  * [[.:mapreduce-tutorial:Step 9]]: Hadoop properties. 
 +  * [[.:mapreduce-tutorial:Step 10]]: Properties of reducers, combiners. 
 +  * [[.:mapreduce-tutorial:Step 11]]: Initialization and cleanup of MR tasks. 
 +  * [[.:mapreduce-tutorial:Step 12]]: Additional output from mappers and reducers. 
 + 
 +=== Advanced MapReduce exercises === 
 +  * [[.:mapreduce-tutorial:Step 13]]: Sorting 
 +  * [[.:mapreduce-tutorial:Step 14]]: N-gram language model 
 +  * [[.:mapreduce-tutorial:Step 15]]: K-means algorithm 
 + 
 +===== Other =====
   * [[user:majlis:hadoop|Further information]]   * [[user:majlis:hadoop|Further information]]
  

[ Back to the navigation ] [ Back to the content ]