[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
courses:mapreduce-tutorial [2012/01/27 16:22]
straka
courses:mapreduce-tutorial [2012/02/05 20:01]
straka
Line 6: Line 6:
   * [[.:mapreduce-tutorial:Introduction]]   * [[.:mapreduce-tutorial:Introduction]]
  
 +===== Overview =====
 +  * [[.:mapreduce-tutorial:Hadoop job overview]]
 +  * [[.:mapreduce-tutorial:Managing a Hadoop cluster]]
 +  * [[.:mapreduce-tutorial:Running jobs]]
 +  * [[.:mapreduce-tutorial:Perl API]], [[.http://hadoop.apache.org/common/docs/r1.0.0/api/index.html|Java API]]
 +  * [[.:mapreduce-tutorial:Making your job configurable]]
 +  * [[.:mapreduce-tutorial:If things go wrong]]
  
 ===== Day 1 ===== ===== Day 1 =====
Line 23: Line 30:
  
 === MapReduce extended === === MapReduce extended ===
-From now on, it is best to run MR jobs using a one-machine cluster -- create a one-machine cluster using ''hadoop-cluster'' for 3h (10800s) and run jobs using ''-jt cluster_master''. Running the scripts locally without any cluster has several disadvantages, most notably having only one reducer per job. 
   * [[.:mapreduce-tutorial:Step 8]]: Multiple mappers, reducers and partitioning.   * [[.:mapreduce-tutorial:Step 8]]: Multiple mappers, reducers and partitioning.
   * [[.:mapreduce-tutorial:Step 9]]: Hadoop properties.   * [[.:mapreduce-tutorial:Step 9]]: Hadoop properties.
Line 32: Line 38:
 === Advanced MapReduce exercises === === Advanced MapReduce exercises ===
 Exercises in this section can be made in any order, but it is recommended to try solving all of them. The [[.:mapreduce-tutorial:Perl API|Perl API reference]] may come handy. Exercises in this section can be made in any order, but it is recommended to try solving all of them. The [[.:mapreduce-tutorial:Perl API|Perl API reference]] may come handy.
-  * [[.:mapreduce-tutorial:Step 13]]: Sorting +  * [[.:mapreduce-tutorial:Step 13]]: Sorting. 
-  * [[.:mapreduce-tutorial:Step 14]]: N-gram language model +  * [[.:mapreduce-tutorial:Step 14]]: N-gram language model. 
-  * [[.:mapreduce-tutorial:Step 15]]: K-means clustering+  * [[.:mapreduce-tutorial:Step 15]]: K-means clustering
 + 
 +=== Beyond MapReduce === 
 +  * [[.:mapreduce-tutorial:Step 16]]: Implementing iterative MapReduce jobs faster using All-Reduce.
  
 ===== Day 2 ===== ===== Day 2 =====
Line 46: Line 55:
 === Java Hadoop basics ==== === Java Hadoop basics ====
   * [[.:mapreduce-tutorial:Step 23]]: Predefined formats and types.   * [[.:mapreduce-tutorial:Step 23]]: Predefined formats and types.
-  * [[.:mapreduce-tutorial:Step 24]]: Mappers, running Java Hadoop jobs.+  * [[.:mapreduce-tutorial:Step 24]]: Mappers, running Java Hadoop jobs, counters.
   * [[.:mapreduce-tutorial:Step 25]]: Reducers, combiners and partitioners.   * [[.:mapreduce-tutorial:Step 25]]: Reducers, combiners and partitioners.
-  * [[.:mapreduce-tutorial:Step 26]]: Counters, compression+  * [[.:mapreduce-tutorial:Step 26]]: Compression and job configuration
-  * [[.:mapreduce-tutorial:Step 27]]: Reusing Mapper and Reducer code. +  * [[.:mapreduce-tutorial:Step 27]]: Running multiple Hadoop jobs in one source file.
- +
-=== Exercises === +
-  * Is [[.:mapreduce-tutorial:Step 13]], [[.:mapreduce-tutorial:Step 14]] and [[.:mapreduce-tutorial:Step 15]] enough?+
  
 === Advanced topics === === Advanced topics ===
-  * Custom input format -- WholeFile and WholeFileAsPath +  * [[.:mapreduce-tutorial:Step 28]]: Custom data types. 
-  * Custom data type -- Pair<A, B>+  * [[.:mapreduce-tutorial:Step 29]]: Custom sorting and grouping comparators. 
 +  * [[.:mapreduce-tutorial:Step 30]]: Custom input formats. 
 + 
 +=== Beyond MapReduce === 
 +  * [[.:mapreduce-tutorial:Step 31]]: Implementing iterative MapReduce jobs faster using All-Reduce.
  
 ===== Other ===== ===== Other =====
   * [[user:majlis:hadoop|Further information]]   * [[user:majlis:hadoop|Further information]]
  

[ Back to the navigation ] [ Back to the content ]