Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
courses:mapreduce-tutorial:hadoop-job-overview [2012/02/05 19:31] straka |
courses:mapreduce-tutorial:hadoop-job-overview [2012/02/06 06:11] (current) straka |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== MapReduce Tutorial : Hadoop job overview ====== | ====== MapReduce Tutorial : Hadoop job overview ====== | ||
- | A Hadoop job consists of: | + | A regular |
* [required] //a mapper// -- processes input (key, value) pairs, produces (key, value) pairs. There can be multiple mappers: each file is divided into (by default 32MB) splits and each split is processed by one mapper. Script ''/ | * [required] //a mapper// -- processes input (key, value) pairs, produces (key, value) pairs. There can be multiple mappers: each file is divided into (by default 32MB) splits and each split is processed by one mapper. Script ''/ | ||
* [optional] //a reducer// -- in an ascending order of keys, it processes a key and all its associated values. Produces (key, value) pairs. User can specify number of reducers: 0, 1 or more, default is 1. | * [optional] //a reducer// -- in an ascending order of keys, it processes a key and all its associated values. Produces (key, value) pairs. User can specify number of reducers: 0, 1 or more, default is 1. | ||
* [optional] //a combiner// -- a reducer which is executed locally on output of a mapper. | * [optional] //a combiner// -- a reducer which is executed locally on output of a mapper. | ||
- | * [optional] //a partitioner// | + | * [optional] //a partitioner// |
- | A Hadoop job can run: | + | An AllReduce Hadoop job ([[.: |
+ | |||
+ | Any Hadoop job can run: | ||
* on a cluster. A separate process is used for every mapper and reducer. | * on a cluster. A separate process is used for every mapper and reducer. | ||
* locally. No processes are created, the computation runs using only a single thread. Useful for debugging. //Warning: in this mode, there cannot be more than 1 reducer. This is a deficiency of Hadoop, which is already fixed in the development version.// | * locally. No processes are created, the computation runs using only a single thread. Useful for debugging. //Warning: in this mode, there cannot be more than 1 reducer. This is a deficiency of Hadoop, which is already fixed in the development version.// | ||