Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-10 [2012/01/25 19:01]
straka
+++ courses:mapreduce-tutorial:step-10 [2012/01/31 09:38] (current)
straka Change Perl commandline syntax.
@@ Line 3: / Line 3: @@
 Sometimes the reduce is a binary operation, which is associative and commutative, e.g. ''+''. In that case it is inefficient to produce all the (key, value) pairs in the mappers and send them through the network.
-Instead, reducer can be executed right after the map, on //some portion// of values belonging to the same key. Only the results are then sent through the network.
+Instead, reducer can be executed right after the map, on //some portion// of values belonging to the same key. Only the aggregated results are then sent through the network.
-A Hadoop job can have such locally executed reducer, called //combiner//. If a combiner is specified, the output of a mapper is processed by a combiner before sending the pairs to reducer. The combiner may be invoked 0, 1 or multiple times, usually when the data are written to disk.
+A Hadoop job can have such locally executed reducer, called a //combiner//. If a combiner is specified, the output of a mapper is processed by a combiner before sending the pairs to reducer. The combiner may be invoked 0, 1 or multiple times, usually when the data are written to disk.
 Typically, the combiner is the same as the reducer of a MR job.
-<code perl>
+<file perl>
-package Mapper;
+package My::Mapper;
-use Moose;
+...
-with 'Hadoop::Mapper';
-sub map {
+package My::Reducer;
-  my ($self, $key, $value, $context) = @_;
+...
-  foreach my $word (split /\W/, $value) {
+package main;
-    next if not length $word;
+use Hadoop::Runner;
-    $context->write($word, 1);
-  }
-}
-package Reducer;
+my $runner = Hadoop::Runner->new(
-use Moose;
+  mapper => My::Mapper->new(),
-with 'Hadoop::Reducer';
+  combiner => My::Reducer->new(), # Specify the combiner.
+  reducer => My::Reducer->new(),
+  input_format => 'KeyValueTextInputFormat');
+...
+</file>
-sub reduce {
+===== Exercise =====
-  my ($self, $key, $values, $context) = @_;
-  my $sum = 0;
+Compare the effect of adding the combiner to a MR job which counts occurrences of words in ''/home/straka/wiki/cs-text-medium'': {{:courses:mapreduce-tutorial:step-5-solution1.txt|step-10-wc-without-combiner.pl}} and {{:courses:mapreduce-tutorial:step-10.txt|step-10-wc-with-combiner.pl}}.
-  while ($values->next) {
+  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution1.txt' -O 'step-10-wc-without-combiner.pl'
-    $sum += $values->value;
+  # NOW VIEW THE FILE
-  }
+  # $EDITOR step-10-wc-without-combiner.pl
+  rm -rf step-10-out-wout; time perl step-10-wc-without-combiner.pl /home/straka/wiki/cs-text-medium/ step-10-out-wout
+  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-10.txt' -O 'step-10-wc-with-combiner.pl'
+  # NOW VIEW THE FILE
+  # $EDITOR step-10-wc-with-combiner.pl
+  rm -rf step-10-out-with; time perl step-10-wc-with-combiner.pl /home/straka/wiki/cs-text-medium/ step-10-out-with
-  $context->write($key, $sum);
+How would you explain the results?
-}
-package Main;
-use Hadoop::Runner;
-my $runner = Hadoop::Runner->new(
-  mapper => Mapper->new(),
-  combiner => Reducer->new(),
-  reducer => Reducer->new(),
-  input_format => 'KeyValueTextInputFormat');
-$runner->run();
+----
-</code>
+<html>
+<table style="width:100%">
+<tr>
+<td style="text-align:left; width: 33%; "></html>[[step-9|Step 9]]: Hadoop properties.<html></td>
+<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
+<td style="text-align:right; width: 33%; "></html>[[step-11|Step 11]]: Initialization and cleanup of MR tasks, performance of combiners.<html></td>
+</tr>
+</table>
+</html>

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences