Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
courses:mapreduce-tutorial [2012/01/27 00:44] straka |
courses:mapreduce-tutorial [2012/01/28 20:18] straka |
| |
=== MapReduce extended === | === MapReduce extended === |
From now on, it is best to run MR jobs using a one-machine cluster. Running the scripts locally without any cluster has several disadvantages, most notably having only one reducer per job. | From now on, it is best to run MR jobs using a one-machine cluster -- create a one-machine cluster using ''hadoop-cluster'' for 3h (10800s) and run jobs using ''-jt cluster_master''. Running the scripts locally without any cluster has several disadvantages, most notably having only one reducer per job. |
* [[.:mapreduce-tutorial:Step 8]]: Multiple mappers, reducers and partitioning. | * [[.:mapreduce-tutorial:Step 8]]: Multiple mappers, reducers and partitioning. |
* [[.:mapreduce-tutorial:Step 9]]: Hadoop properties. | * [[.:mapreduce-tutorial:Step 9]]: Hadoop properties. |
=== Advanced MapReduce exercises === | === Advanced MapReduce exercises === |
Exercises in this section can be made in any order, but it is recommended to try solving all of them. The [[.:mapreduce-tutorial:Perl API|Perl API reference]] may come handy. | Exercises in this section can be made in any order, but it is recommended to try solving all of them. The [[.:mapreduce-tutorial:Perl API|Perl API reference]] may come handy. |
* [[.:mapreduce-tutorial:Step 13]]: Sorting | * [[.:mapreduce-tutorial:Step 13]]: Sorting. |
* [[.:mapreduce-tutorial:Step 14]]: N-gram language model | * [[.:mapreduce-tutorial:Step 14]]: N-gram language model. |
* [[.:mapreduce-tutorial:Step 15]]: K-means clustering | * [[.:mapreduce-tutorial:Step 15]]: K-means clustering. |
| |
===== Day 2 ===== | ===== Day 2 ===== |
| |
=== Environment === | === Environment === |
* [[.:mapreduce-tutorial:Step 21]]: Preparing the environment. | * [[.:mapreduce-tutorial:Step 21]]: Preparing the environment |
* [[.:mapreduce-tutorial:Step 22]]: Optional -- Setting Eclipse. | * [[.:mapreduce-tutorial:Step 22]]: Optional -- Setting Eclipse |
| |
=== Java Hadoop basics ==== | === Java Hadoop basics ==== |
* [[.:mapreduce-tutorial:Step 23]]: Predefined formats and types. | * [[.:mapreduce-tutorial:Step 23]]: Predefined formats and types |
* [[.:mapreduce-tutorial:Step 24]]: Mappers, running Java Hadoop jobs. | * [[.:mapreduce-tutorial:Step 24]]: Mappers, running Java Hadoop jobs |
* [[.:mapreduce-tutorial:Step 25]]: Reducers, combiners and partitioners. | * [[.:mapreduce-tutorial:Step 25]]: Reducers, combiners and partitioners |
* [[.:mapreduce-tutorial:Step 26]]: Counters, compression. | * [[.:mapreduce-tutorial:Step 26]]: Counters and job configuration |
* [[.:mapreduce-tutorial:Step 27]]: Reusing Mapper and Reducer code. | |
| |
=== Exercises === | |
| |
=== Advanced topics === | === Advanced topics === |
* Custom input format -- WholeFile and WholeFileAsPath | * [[.:mapreduce-tutorial:Step 27]]: Custom data types |
* Custom data type -- Pair<A, B> | * [[.:mapreduce-tutorial:Step 28]]: Custom input formats |
| * [[.:mapreduce-tutorial:Step 29]]: Running multiple Hadoop jobs |
| |
===== Other ===== | ===== Other ===== |
* [[user:majlis:hadoop|Further information]] | * [[user:majlis:hadoop|Further information]] |
| |