[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-8 [2012/01/25 15:00]
straka
courses:mapreduce-tutorial:step-8 [2012/01/25 18:37]
straka
Line 12: Line 12:
  
 To use multiple reducers, the MR job must be executed by a cluster (even with one computer), not locally. The number of reducers is specified by ''-r'' flag: To use multiple reducers, the MR job must be executed by a cluster (even with one computer), not locally. The number of reducers is specified by ''-r'' flag:
-  perl script.pl [-cluster_master | -c cluster_size [-w sec_to_wait]] [-r number_of_reducers]+  perl script.pl run [-jt cluster_master | -c cluster_size [-w sec_to_wait]] [-r number_of_reducers]
  
 ==== Partitioning ==== ==== Partitioning ====
Line 19: Line 19:
 By default, (key, value) pair is sent to reducer number //hash(key) modulo number_of_reducers//. This guarantees that for one key, all its values are processed by unique reducer. By default, (key, value) pair is sent to reducer number //hash(key) modulo number_of_reducers//. This guarantees that for one key, all its values are processed by unique reducer.
  
-To override the default behaviour, MR job can specify a //partitioner//.+To override the default behaviour, MR job can specify a //partitioner//A partitioner is given each (key, value) pair produced by a mapper, number of reducers, and outputs the zero-based number of reducer, where this (key, value) pair belongs:
  
 +<code perl>
 +package Partitioner;
 +use Moose;
 +with 'Hadoop::Partitioner';
 +
 +sub getPartition {
 +  my ($self, $key, $value, $partitions) = @_;
 +
 +  return $key % $partitions;
 +}
 +
 +...
 +package Main;
 +use Hadoop::Runner;
 +
 +my $runner = Hadoop::Runner->new(
 +  ...
 +  partitioner => Partitioner->new(),
 +  ...);
 +...
 +</code>
 +
 +A MR job must have a reducer if it specifies a partitioner. Also, the partitioner is not called if there is only one reducer.
 +
 +===== The order of keys during reduce =====
 +It is guaranteed that every reducer processes the keys in //ascending order//.
 +
 +On the other hand, when processing one key, the order of its values is undefined.
 +
 +===== Example =====
 +
 +Run a MR job on '/home/straka/wiki/cs-text-medium', which creates an ascending list of unique article names and at the same time an ascending list of unique words.
 +
 +{{:courses:mapreduce-tutorial:step-8-solution.txt|Solution.pl}}

[ Back to the navigation ] [ Back to the content ]