Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-8 [2012/01/29 21:01]
straka
+++ courses:mapreduce-tutorial:step-8 [2012/01/29 21:12]
straka
@@ Line 1: / Line 1: @@
 ====== MapReduce Tutorial : Multiple mappers, reducers and partitioning ======
-It is important for a job, which should run on many computers at the same time, to use multiple mappers and reducers. It is possible to control these numbers to some degree.
+A Hadoop job, which is expected to run on many computers at the same time, need to use multiple mappers and reducers. It is possible to control these numbers to some degree.
 ===== Multiple mappers =====
@@ Line 21: / Line 21: @@
 By default, (key, value) pair is sent to a reducer number //hash(key) modulo number_of_reducers//. This guarantees that for one key, all its values are processed by a unique reducer.
-To override the default behaviour, MR job can specify a //partitioner//. A partitioner is given every (key, value) pair produced by a mapper, it is also given the number of reducers, and outputs the zero-based number of reducer, where this (key, value) pair belongs:
+To override the default behaviour, MR job can specify a //partitioner//. A partitioner is given every (key, value) pair produced by a mapper, it is also given the number of reducers, and outputs the zero-based number of reducer, where this (key, value) pair belongs.
+A partitioner should be provided if
+  * the default partitioner fails to distribute the data between reducers equally, i.e., some of the reducers operate on much more data than others.
+  * you need an explicit control of (key, value) placement. This can happen for example when [[.:step-13|sorting data]].
 <code perl>

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences