[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
courses:mapreduce-tutorial [2012/01/29 21:51]
straka [xmlrpc dokuvimki edit]
courses:mapreduce-tutorial [2012/02/05 20:01] (current)
straka
Line 6: Line 6:
   * [[.:mapreduce-tutorial:Introduction]]   * [[.:mapreduce-tutorial:Introduction]]
  
 +===== Overview =====
 +  * [[.:mapreduce-tutorial:Hadoop job overview]]
 +  * [[.:mapreduce-tutorial:Managing a Hadoop cluster]]
 +  * [[.:mapreduce-tutorial:Running jobs]]
 +  * [[.:mapreduce-tutorial:Perl API]], [[http://hadoop.apache.org/common/docs/r1.0.0/api/index.html|Java API]]
 +  * [[.:mapreduce-tutorial:Making your job configurable]]
 +  * [[.:mapreduce-tutorial:If things go wrong]]
  
 ===== Day 1 ===== ===== Day 1 =====
Line 23: Line 30:
  
 === MapReduce extended === === MapReduce extended ===
-From now on, it is best to run MR jobs using a one-machine cluster -- create a one-machine cluster using ''hadoop-cluster'' for 3h (10800s) and run jobs using ''-jt cluster_master''. Running the scripts locally without any cluster has several disadvantages, most notably having only one reducer per job. 
   * [[.:mapreduce-tutorial:Step 8]]: Multiple mappers, reducers and partitioning.   * [[.:mapreduce-tutorial:Step 8]]: Multiple mappers, reducers and partitioning.
   * [[.:mapreduce-tutorial:Step 9]]: Hadoop properties.   * [[.:mapreduce-tutorial:Step 9]]: Hadoop properties.
Line 35: Line 41:
   * [[.:mapreduce-tutorial:Step 14]]: N-gram language model.   * [[.:mapreduce-tutorial:Step 14]]: N-gram language model.
   * [[.:mapreduce-tutorial:Step 15]]: K-means clustering.   * [[.:mapreduce-tutorial:Step 15]]: K-means clustering.
 +
 +=== Beyond MapReduce ===
 +  * [[.:mapreduce-tutorial:Step 16]]: Implementing iterative MapReduce jobs faster using All-Reduce.
  
 ===== Day 2 ===== ===== Day 2 =====
Line 46: Line 55:
 === Java Hadoop basics ==== === Java Hadoop basics ====
   * [[.:mapreduce-tutorial:Step 23]]: Predefined formats and types.   * [[.:mapreduce-tutorial:Step 23]]: Predefined formats and types.
-  * [[.:mapreduce-tutorial:Step 24]]: Mappers, running Java Hadoop jobs.+  * [[.:mapreduce-tutorial:Step 24]]: Mappers, running Java Hadoop jobs, counters.
   * [[.:mapreduce-tutorial:Step 25]]: Reducers, combiners and partitioners.   * [[.:mapreduce-tutorial:Step 25]]: Reducers, combiners and partitioners.
-  * [[.:mapreduce-tutorial:Step 26]]: Counters, compression and job configuration.+  * [[.:mapreduce-tutorial:Step 26]]: Compression and job configuration
 +  * [[.:mapreduce-tutorial:Step 27]]: Running multiple Hadoop jobs in one source file.
  
 === Advanced topics === === Advanced topics ===
-  * [[.:mapreduce-tutorial:Step 27]]: Custom data types. +  * [[.:mapreduce-tutorial:Step 28]]: Custom data types. 
-  * [[.:mapreduce-tutorial:Step 28]]: Running multiple Hadoop jobs in one class+  * [[.:mapreduce-tutorial:Step 29]]: Custom sorting and grouping comparators
-  * [[.:mapreduce-tutorial:Step 29]]: Custom input formats.+  * [[.:mapreduce-tutorial:Step 30]]: Custom input formats.
  
 === Beyond MapReduce === === Beyond MapReduce ===
-  * [[.:mapreduce-tutorial:Step 30]]: Implementing iterative MapReduce jobs faster using All-Reduce.+  * [[.:mapreduce-tutorial:Step 31]]: Implementing iterative MapReduce jobs faster using All-Reduce.
  
 ===== Other ===== ===== Other =====
   * [[user:majlis:hadoop|Further information]]   * [[user:majlis:hadoop|Further information]]
  

[ Back to the navigation ] [ Back to the content ]