Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
courses:mapreduce-tutorial:hadoop-job-overview [2012/02/05 19:31] straka |
courses:mapreduce-tutorial:hadoop-job-overview [2012/02/06 06:11] (current) straka |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== MapReduce Tutorial : Hadoop job overview ====== | ====== MapReduce Tutorial : Hadoop job overview ====== | ||
| - | A Hadoop job consists of: | + | A regular |
| * [required] //a mapper// -- processes input (key, value) pairs, produces (key, value) pairs. There can be multiple mappers: each file is divided into (by default 32MB) splits and each split is processed by one mapper. Script ''/ | * [required] //a mapper// -- processes input (key, value) pairs, produces (key, value) pairs. There can be multiple mappers: each file is divided into (by default 32MB) splits and each split is processed by one mapper. Script ''/ | ||
| * [optional] //a reducer// -- in an ascending order of keys, it processes a key and all its associated values. Produces (key, value) pairs. User can specify number of reducers: 0, 1 or more, default is 1. | * [optional] //a reducer// -- in an ascending order of keys, it processes a key and all its associated values. Produces (key, value) pairs. User can specify number of reducers: 0, 1 or more, default is 1. | ||
| * [optional] //a combiner// -- a reducer which is executed locally on output of a mapper. | * [optional] //a combiner// -- a reducer which is executed locally on output of a mapper. | ||
| - | * [optional] //a partitioner// | + | * [optional] //a partitioner// |
| - | A Hadoop job can run: | + | An AllReduce Hadoop job ([[.: |
| + | |||
| + | Any Hadoop job can run: | ||
| * on a cluster. A separate process is used for every mapper and reducer. | * on a cluster. A separate process is used for every mapper and reducer. | ||
| * locally. No processes are created, the computation runs using only a single thread. Useful for debugging. //Warning: in this mode, there cannot be more than 1 reducer. This is a deficiency of Hadoop, which is already fixed in the development version.// | * locally. No processes are created, the computation runs using only a single thread. Useful for debugging. //Warning: in this mode, there cannot be more than 1 reducer. This is a deficiency of Hadoop, which is already fixed in the development version.// | ||
