Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
courses:mapreduce-tutorial:step-8 [2012/01/25 18:37] straka |
courses:mapreduce-tutorial:step-8 [2012/01/25 22:09] straka |
||
---|---|---|---|
Line 6: | Line 6: | ||
The number of mappers is determined automatically according to input files sizes. Every input file is divided into //splits//. The default split size is 32MB. Every file split is then executed by a different mapper. | The number of mappers is determined automatically according to input files sizes. Every input file is divided into //splits//. The default split size is 32MB. Every file split is then executed by a different mapper. | ||
- | The size of file split can be overridden by '' | + | The size of file split can be overridden by '' |
===== Multiple reducers ===== | ===== Multiple reducers ===== | ||
- | Then number of reducers is specified by the job, default number is one. As the outputs of reducers are not merged, there are as many output files as reducers. | + | The number of reducers is specified by the job, defaulting to one if unspecified. As the outputs of reducers are not merged, there are as many output files as reducers. |
To use multiple reducers, the MR job must be executed by a cluster (even with one computer), not locally. The number of reducers is specified by '' | To use multiple reducers, the MR job must be executed by a cluster (even with one computer), not locally. The number of reducers is specified by '' | ||
Line 17: | Line 17: | ||
When there are multiple reducers, it is important how the (key, value) pairs are distributed between the reducers. | When there are multiple reducers, it is important how the (key, value) pairs are distributed between the reducers. | ||
- | By default, (key, value) pair is sent to reducer number //hash(key) modulo number_of_reducers// | + | By default, (key, value) pair is sent to a reducer number //hash(key) modulo number_of_reducers// |
- | To override the default behaviour, MR job can specify a // | + | To override the default behaviour, MR job can specify a // |
<code perl> | <code perl> | ||
Line 48: | Line 48: | ||
It is guaranteed that every reducer processes the keys in //ascending order//. | It is guaranteed that every reducer processes the keys in //ascending order//. | ||
- | On the other hand, when processing one key, the order of its values is undefined. | + | On the other hand, the order of values |
===== Example ===== | ===== Example ===== | ||
Line 55: | Line 55: | ||
{{: | {{: | ||
+ |