Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
courses:mapreduce-tutorial:hadoop-job-overview [2012/02/05 19:15] straka vytvořeno |
courses:mapreduce-tutorial:hadoop-job-overview [2012/02/06 06:11] (current) straka |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== MapReduce Tutorial : ====== | + | ====== MapReduce Tutorial : Hadoop job overview |
+ | |||
+ | A regular Hadoop job consists of: | ||
+ | * [required] //a mapper// -- processes input (key, value) pairs, produces (key, value) pairs. There can be multiple mappers: each file is divided into (by default 32MB) splits and each split is processed by one mapper. Script ''/ | ||
+ | * [optional] //a reducer// -- in an ascending order of keys, it processes a key and all its associated values. Produces (key, value) pairs. User can specify number of reducers: 0, 1 or more, default is 1. | ||
+ | * [optional] //a combiner// -- a reducer which is executed locally on output of a mapper. | ||
+ | * [optional] //a partitioner// | ||
+ | |||
+ | An AllReduce Hadoop job ([[.: | ||
+ | |||
+ | Any Hadoop job can run: | ||
+ | * on a cluster. A separate process is used for every mapper and reducer. | ||
+ | * locally. No processes are created, the computation runs using only a single thread. Useful for debugging. //Warning: in this mode, there cannot be more than 1 reducer. This is a deficiency of Hadoop, which is already fixed in the development version.// |