| Both sides previous revision
Previous revision
Next revision
|
Previous revision
|
courses:mapreduce-tutorial [2012/01/26 18:27] straka |
courses:mapreduce-tutorial [2012/02/05 20:01] (current) straka |
| * [[.:mapreduce-tutorial:Introduction]] | * [[.:mapreduce-tutorial:Introduction]] |
| |
| | ===== Overview ===== |
| | * [[.:mapreduce-tutorial:Hadoop job overview]] |
| | * [[.:mapreduce-tutorial:Managing a Hadoop cluster]] |
| | * [[.:mapreduce-tutorial:Running jobs]] |
| | * [[.:mapreduce-tutorial:Perl API]], [[http://hadoop.apache.org/common/docs/r1.0.0/api/index.html|Java API]] |
| | * [[.:mapreduce-tutorial:Making your job configurable]] |
| | * [[.:mapreduce-tutorial:If things go wrong]] |
| |
| ===== Day 1 ===== | ===== Day 1 ===== |
| |
| === MapReduce extended === | === MapReduce extended === |
| From now on, it is best to run MR jobs using a one-machine cluster. Running the scripts locally without any cluster has several disadvantages, most notably having only one reducer per job. | |
| * [[.:mapreduce-tutorial:Step 8]]: Multiple mappers, reducers and partitioning. | * [[.:mapreduce-tutorial:Step 8]]: Multiple mappers, reducers and partitioning. |
| * [[.:mapreduce-tutorial:Step 9]]: Hadoop properties. | * [[.:mapreduce-tutorial:Step 9]]: Hadoop properties. |
| === Advanced MapReduce exercises === | === Advanced MapReduce exercises === |
| Exercises in this section can be made in any order, but it is recommended to try solving all of them. The [[.:mapreduce-tutorial:Perl API|Perl API reference]] may come handy. | Exercises in this section can be made in any order, but it is recommended to try solving all of them. The [[.:mapreduce-tutorial:Perl API|Perl API reference]] may come handy. |
| * [[.:mapreduce-tutorial:Step 13]]: Sorting | * [[.:mapreduce-tutorial:Step 13]]: Sorting. |
| * [[.:mapreduce-tutorial:Step 14]]: N-gram language model | * [[.:mapreduce-tutorial:Step 14]]: N-gram language model. |
| * [[.:mapreduce-tutorial:Step 15]]: K-means clustering | * [[.:mapreduce-tutorial:Step 15]]: K-means clustering. |
| | |
| | === Beyond MapReduce === |
| | * [[.:mapreduce-tutorial:Step 16]]: Implementing iterative MapReduce jobs faster using All-Reduce. |
| |
| ===== Day 2 ===== | ===== Day 2 ===== |
| |
| Today we will be using the Java API. | Today we will be using the [[http://hadoop.apache.org/common/docs/r1.0.0/api/index.html|Java API]]. |
| |
| === Environment === | === Environment === |
| * [[.:mapreduce-tutorial:Step 21]]: Preparing the environment. | * [[.:mapreduce-tutorial:Step 21]]: Preparing the environment. |
| * [[.:mapreduce-tutorial:Step 22]]: Optional -- Setting Eclipse. | * [[.:mapreduce-tutorial:Step 22]]: Optional -- Setting Eclipse. |
| | |
| | === Java Hadoop basics ==== |
| | * [[.:mapreduce-tutorial:Step 23]]: Predefined formats and types. |
| | * [[.:mapreduce-tutorial:Step 24]]: Mappers, running Java Hadoop jobs, counters. |
| | * [[.:mapreduce-tutorial:Step 25]]: Reducers, combiners and partitioners. |
| | * [[.:mapreduce-tutorial:Step 26]]: Compression and job configuration. |
| | * [[.:mapreduce-tutorial:Step 27]]: Running multiple Hadoop jobs in one source file. |
| | |
| | === Advanced topics === |
| | * [[.:mapreduce-tutorial:Step 28]]: Custom data types. |
| | * [[.:mapreduce-tutorial:Step 29]]: Custom sorting and grouping comparators. |
| | * [[.:mapreduce-tutorial:Step 30]]: Custom input formats. |
| | |
| | === Beyond MapReduce === |
| | * [[.:mapreduce-tutorial:Step 31]]: Implementing iterative MapReduce jobs faster using All-Reduce. |
| |
| ===== Other ===== | ===== Other ===== |
| * [[user:majlis:hadoop|Further information]] | * [[user:majlis:hadoop|Further information]] |
| |