Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
courses:mapreduce-tutorial:step-8 [2012/01/29 21:01] straka |
courses:mapreduce-tutorial:step-8 [2012/01/29 21:12] straka |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== MapReduce Tutorial : Multiple mappers, reducers and partitioning ====== | ====== MapReduce Tutorial : Multiple mappers, reducers and partitioning ====== | ||
- | It is important for a job, which should | + | A Hadoop |
===== Multiple mappers ===== | ===== Multiple mappers ===== | ||
Line 21: | Line 21: | ||
By default, (key, value) pair is sent to a reducer number //hash(key) modulo number_of_reducers// | By default, (key, value) pair is sent to a reducer number //hash(key) modulo number_of_reducers// | ||
- | To override the default behaviour, MR job can specify a // | + | To override the default behaviour, MR job can specify a // |
+ | |||
+ | A partitioner should be provided if | ||
+ | * the default partitioner fails to distribute the data between reducers equally, i.e., some of the reducers operate on much more data than others. | ||
+ | * you need an explicit control of (key, value) placement. This can happen for example when [[.:step-13|sorting data]]. | ||
<code perl> | <code perl> |