[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
courses:mapreduce-tutorial:step-10 [2012/01/25 19:01]
straka
courses:mapreduce-tutorial:step-10 [2012/01/31 09:38] (current)
straka Change Perl commandline syntax.
Line 3: Line 3:
 Sometimes the reduce is a binary operation, which is associative and commutative, e.g. ''+''. In that case it is inefficient to produce all the (key, value) pairs in the mappers and send them through the network. Sometimes the reduce is a binary operation, which is associative and commutative, e.g. ''+''. In that case it is inefficient to produce all the (key, value) pairs in the mappers and send them through the network.
  
-Instead, reducer can be executed right after the map, on //some portion// of values belonging to the same key. Only the results are then sent through the network.+Instead, reducer can be executed right after the map, on //some portion// of values belonging to the same key. Only the aggregated results are then sent through the network.
  
-A Hadoop job can have such locally executed reducer, called //combiner//. If a combiner is specified, the output of a mapper is processed by a combiner before sending the pairs to reducer. The combiner may be invoked 0, 1 or multiple times, usually when the data are written to disk.+A Hadoop job can have such locally executed reducer, called //combiner//. If a combiner is specified, the output of a mapper is processed by a combiner before sending the pairs to reducer. The combiner may be invoked 0, 1 or multiple times, usually when the data are written to disk.
  
 Typically, the combiner is the same as the reducer of a MR job. Typically, the combiner is the same as the reducer of a MR job.
  
-<code perl> +<file perl> 
-package Mapper; +package My::Mapper; 
-use Moose; +...
-with 'Hadoop::Mapper';+
  
-sub map { +package My::Reducer; 
-  my ($self, $key, $value, $context) = @_;+...
  
-  foreach my $word (split /\W/, $value) { +package main
-    next if not length $word+use Hadoop::Runner;
-    $context->write($word, 1); +
-  } +
-}+
  
-package Reducer; +my $runner = Hadoop::Runner->new( 
-use Moose; +  mapper => My::Mapper->new(), 
-with 'Hadoop::Reducer';+  combiner => My::Reducer->new(), # Specify the combiner. 
 +  reducer => My::Reducer->new(), 
 +  input_format => 'KeyValueTextInputFormat'); 
 +... 
 +</file>
  
-sub reduce { +===== Exercise =====
-  my ($self, $key, $values, $context) @_;+
  
-  my $sum = 0+Compare the effect of adding the combiner to a MR job which counts occurrences of words in ''/home/straka/wiki/cs-text-medium'': {{:courses:mapreduce-tutorial:step-5-solution1.txt|step-10-wc-without-combiner.pl}} and {{:courses:mapreduce-tutorial:step-10.txt|step-10-wc-with-combiner.pl}}. 
-  while ($values->next) { +  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution1.txt' -O 'step-10-wc-without-combiner.pl' 
-    $sum += $values->value; +  # NOW VIEW THE FILE 
-  }+  # $EDITOR step-10-wc-without-combiner.pl 
 +  rm -rf step-10-out-wouttime perl step-10-wc-without-combiner.pl /home/straka/wiki/cs-text-medium/ step-10-out-wout 
 +   
 +  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-10.txt' -O 'step-10-wc-with-combiner.pl' 
 +  # NOW VIEW THE FILE 
 +  # $EDITOR step-10-wc-with-combiner.pl 
 +  rm -rf step-10-out-with; time perl step-10-wc-with-combiner.pl /home/straka/wiki/cs-text-medium/ step-10-out-with
  
-  $context->write($key, $sum); +How would you explain the results?
-+
- +
-package Main; +
-use Hadoop::Runner; +
- +
-my $runner = Hadoop::Runner->new( +
-  mapper => Mapper->new(), +
-  combiner => Reducer->new(), +
-  reducer => Reducer->new(), +
-  input_format => 'KeyValueTextInputFormat');+
  
-$runner->run(); +----
-</code>+
  
 +<html>
 +<table style="width:100%">
 +<tr>
 +<td style="text-align:left; width: 33%; "></html>[[step-9|Step 9]]: Hadoop properties.<html></td>
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
 +<td style="text-align:right; width: 33%; "></html>[[step-11|Step 11]]: Initialization and cleanup of MR tasks, performance of combiners.<html></td>
 +</tr>
 +</table>
 +</html>

[ Back to the navigation ] [ Back to the content ]