Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-8 [2012/01/25 15:00]
straka
+++ courses:mapreduce-tutorial:step-8 [2012/01/25 15:54]
straka
@@ Line 12: / Line 12: @@
 To use multiple reducers, the MR job must be executed by a cluster (even with one computer), not locally. The number of reducers is specified by ''-r'' flag:
-  perl script.pl [-j cluster_master | -c cluster_size [-w sec_to_wait]] [-r number_of_reducers]
+  perl script.pl run [-jt cluster_master | -c cluster_size [-w sec_to_wait]] [-r number_of_reducers]
 ==== Partitioning ====
@@ Line 19: / Line 19: @@
 By default, (key, value) pair is sent to reducer number //hash(key) modulo number_of_reducers//. This guarantees that for one key, all its values are processed by unique reducer.
-To override the default behaviour, MR job can specify a //partitioner//.
+To override the default behaviour, MR job can specify a //partitioner//. A partitioner is given each (key, value) pair produced by a mapper, number of reducers, and outputs the zero-based number of reducer, where this (key, value) pair belongs:
+<code perl>
+package Partitioner;
+use Moose;
+with 'Hadoop::Partitioner';
+sub getPartition {
+  my ($self, $key, $value, $partitions) = @_;
+  return $key % $partitions;
+}
+...
+package Main;
+use Hadoop::Runner;
+my $runner = Hadoop::Runner->new(
+  ...
+  partitioner => Partitioner->new(),
+  ...);
+...
+</code>
+A MR job must have a reducer if it specifies a partitioner. Also, the partitioner is not called if there is only one reducer.
+===== Example =====
+Run a MR job on '/home/straka/wiki/cs-text-medium', which creates a list of unique article names and at the same time list of unique words.
+{{:courses:mapreduce-tutorial:step-8-solution.txt|Solution.pl}}

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences