[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-8 [2012/01/25 14:54]
straka
courses:mapreduce-tutorial:step-8 [2012/01/25 15:29]
straka
Line 9: Line 9:
  
 ===== Multiple reducers ===== ===== Multiple reducers =====
 +Then number of reducers is specified by the job, default number is one. As the outputs of reducers are not merged, there are as many output files as reducers.
  
 +To use multiple reducers, the MR job must be executed by a cluster (even with one computer), not locally. The number of reducers is specified by ''-r'' flag:
 +  perl script.pl [-j cluster_master | -c cluster_size [-w sec_to_wait]] [-r number_of_reducers]
  
 +==== Partitioning ====
 +When there are multiple reducers, it is important how the (key, value) pairs are distributed between the reducers.
 +
 +By default, (key, value) pair is sent to reducer number //hash(key) modulo number_of_reducers//. This guarantees that for one key, all its values are processed by unique reducer.
 +
 +To override the default behaviour, MR job can specify a //partitioner//. A partitioner is given each (key, value) pair produced by a mapper, number of reducers, and outputs the zero-based number of reducer, where this (key, value) pair belongs:
 +
 +<code perl>
 +package Partitioner;
 +use Moose;
 +with 'Hadoop::Partitioner';
 +
 +sub getPartition {
 +  my ($self, $key, $value, $partitions) = @_;
 +
 +  return $key % $partitions;
 +}
 +
 +...
 +package Main;
 +use Hadoop::Runner;
 +
 +my $runner = Hadoop::Runner->new(
 +  ...
 +  partitioner => Partitioner->new(),
 +  ...);
 +...
 +</code>
 +
 +A MR job must have a reducer if it specifies a partitioner. Also, the partitioner is not called if there is only one reducer.
 +
 +===== Example =====
 +
 +Run a MR job on '/home/straka/wiki/cs-text-medium', which creates a list of unique article names and at the same time list of unique words.
 +
 +{{:courses:mapreduce-tutorial:step-8-solution.txt|Solution.pl}}

[ Back to the navigation ] [ Back to the content ]