[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
courses:mapreduce-tutorial [2012/01/29 21:48]
straka [xmlrpc dokuvimki edit]
courses:mapreduce-tutorial [2012/02/05 20:01]
straka
Line 6: Line 6:
   * [[.:mapreduce-tutorial:Introduction]]   * [[.:mapreduce-tutorial:Introduction]]
  
 +===== Overview =====
 +  * [[.:mapreduce-tutorial:Hadoop job overview]]
 +  * [[.:mapreduce-tutorial:Managing a Hadoop cluster]]
 +  * [[.:mapreduce-tutorial:Running jobs]]
 +  * [[.:mapreduce-tutorial:Perl API]], [[.http://hadoop.apache.org/common/docs/r1.0.0/api/index.html|Java API]]
 +  * [[.:mapreduce-tutorial:Making your job configurable]]
 +  * [[.:mapreduce-tutorial:If things go wrong]]
  
 ===== Day 1 ===== ===== Day 1 =====
Line 23: Line 30:
  
 === MapReduce extended === === MapReduce extended ===
-From now on, it is best to run MR jobs using a one-machine cluster -- create a one-machine cluster using ''hadoop-cluster'' for 3h (10800s) and run jobs using ''-jt cluster_master''. Running the scripts locally without any cluster has several disadvantages, most notably having only one reducer per job. 
   * [[.:mapreduce-tutorial:Step 8]]: Multiple mappers, reducers and partitioning.   * [[.:mapreduce-tutorial:Step 8]]: Multiple mappers, reducers and partitioning.
   * [[.:mapreduce-tutorial:Step 9]]: Hadoop properties.   * [[.:mapreduce-tutorial:Step 9]]: Hadoop properties.
Line 35: Line 41:
   * [[.:mapreduce-tutorial:Step 14]]: N-gram language model.   * [[.:mapreduce-tutorial:Step 14]]: N-gram language model.
   * [[.:mapreduce-tutorial:Step 15]]: K-means clustering.   * [[.:mapreduce-tutorial:Step 15]]: K-means clustering.
 +
 +=== Beyond MapReduce ===
 +  * [[.:mapreduce-tutorial:Step 16]]: Implementing iterative MapReduce jobs faster using All-Reduce.
  
 ===== Day 2 ===== ===== Day 2 =====
Line 46: Line 55:
 === Java Hadoop basics ==== === Java Hadoop basics ====
   * [[.:mapreduce-tutorial:Step 23]]: Predefined formats and types.   * [[.:mapreduce-tutorial:Step 23]]: Predefined formats and types.
-  * [[.:mapreduce-tutorial:Step 24]]: Mappers, running Java Hadoop jobs.+  * [[.:mapreduce-tutorial:Step 24]]: Mappers, running Java Hadoop jobs, counters.
   * [[.:mapreduce-tutorial:Step 25]]: Reducers, combiners and partitioners.   * [[.:mapreduce-tutorial:Step 25]]: Reducers, combiners and partitioners.
-  * [[.:mapreduce-tutorial:Step 26]]: Counters, compression and job configuration.+  * [[.:mapreduce-tutorial:Step 26]]: Compression and job configuration
 +  * [[.:mapreduce-tutorial:Step 27]]: Running multiple Hadoop jobs in one source file.
  
 === Advanced topics === === Advanced topics ===
-  * [[.:mapreduce-tutorial:Step 27]]: Custom data types. +  * [[.:mapreduce-tutorial:Step 28]]: Custom data types. 
-  * [[.:mapreduce-tutorial:Step 28]]: Running multiple Hadoop jobs in one class+  * [[.:mapreduce-tutorial:Step 29]]: Custom sorting and grouping comparators
-  * [[.:mapreduce-tutorial:Step 29]]: Custom input formats.+  * [[.:mapreduce-tutorial:Step 30]]: Custom input formats.
  
 === Beyond MapReduce === === Beyond MapReduce ===
-  * [[.:mapreduce-tutorial:Step 30]]: Implementing iterative MapReduce jobs faster using All-Reduce.+  * [[.:mapreduce-tutorial:Step 31]]: Implementing iterative MapReduce jobs faster using All-Reduce.
  
 ===== Other ===== ===== Other =====
   * [[user:majlis:hadoop|Further information]]   * [[user:majlis:hadoop|Further information]]
  

[ Back to the navigation ] [ Back to the content ]