[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
courses:mapreduce-tutorial:step-10 [2012/01/25 19:06]
straka
courses:mapreduce-tutorial:step-10 [2012/01/30 10:28]
majlis
Line 3: Line 3:
 Sometimes the reduce is a binary operation, which is associative and commutative, e.g. ''+''. In that case it is inefficient to produce all the (key, value) pairs in the mappers and send them through the network. Sometimes the reduce is a binary operation, which is associative and commutative, e.g. ''+''. In that case it is inefficient to produce all the (key, value) pairs in the mappers and send them through the network.
  
-Instead, reducer can be executed right after the map, on //some portion// of values belonging to the same key. Only the results are then sent through the network.+Instead, reducer can be executed right after the map, on //some portion// of values belonging to the same key. Only the aggregated results are then sent through the network.
  
-A Hadoop job can have such locally executed reducer, called //combiner//. If a combiner is specified, the output of a mapper is processed by a combiner before sending the pairs to reducer. The combiner may be invoked 0, 1 or multiple times, usually when the data are written to disk.+A Hadoop job can have such locally executed reducer, called //combiner//. If a combiner is specified, the output of a mapper is processed by a combiner before sending the pairs to reducer. The combiner may be invoked 0, 1 or multiple times, usually when the data are written to disk.
  
 Typically, the combiner is the same as the reducer of a MR job. Typically, the combiner is the same as the reducer of a MR job.
  
-<code perl> +<file perl> 
-package Mapper;+package My::Mapper;
 ... ...
  
-package Reducer;+package My::Reducer;
 ... ...
  
-package Main;+package main;
 use Hadoop::Runner; use Hadoop::Runner;
  
 my $runner = Hadoop::Runner->new( my $runner = Hadoop::Runner->new(
-  mapper => Mapper->new(), +  mapper => My::Mapper->new(), 
-  combiner => Reducer->new(), # Specify the combiner. +  combiner => My::Reducer->new(), # Specify the combiner. 
-  reducer => Reducer->new(),+  reducer => My::Reducer->new(),
   input_format => 'KeyValueTextInputFormat');   input_format => 'KeyValueTextInputFormat');
 ... ...
-</code>+</file>
  
-===== Excersise =====+===== Exercise =====
  
-Compare the effect of adding the combiner to a MR job which counts occurences of words: {{:courses:mapreduce-tutorial:step-5-solution1.txt|wc-without-combiner.pl}} and {{:courses:mapreduce-tutorial:step-10.txt|wc-with-combiner.pl}}.+Compare the effect of adding the combiner to a MR job which counts occurrences of words in ''/home/straka/wiki/cs-text-medium'': {{:courses:mapreduce-tutorial:step-5-solution1.txt|step-10-wc-without-combiner.pl}} and {{:courses:mapreduce-tutorial:step-10.txt|step-10-wc-with-combiner.pl}}. 
 +  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution1.txt' -O 'step-10-wc-without-combiner.pl' 
 +  # NOW VIEW THE FILE 
 +  # $EDITOR step-10-wc-without-combiner.pl 
 +  rm -rf step-10-out-wout; time perl step-10-wc-without-combiner.pl run /home/straka/wiki/cs-text-medium/ step-10-out-wout 
 +   
 +  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-10.txt' -O 'step-10-wc-with-combiner.pl' 
 +  # NOW VIEW THE FILE 
 +  # $EDITOR step-10-wc-with-combiner.pl 
 +  rm -rf step-10-out-with; time perl step-10-wc-with-combiner.pl run /home/straka/wiki/cs-text-medium/ step-10-out-with 
 + 
 +How would you explain the results? 
 + 
 +---- 
 + 
 +<html> 
 +<table style="width:100%"> 
 +<tr> 
 +<td style="text-align:left; width: 33%; "></html>[[step-9|Step 9]]: Hadoop properties.<html></td> 
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> 
 +<td style="text-align:right; width: 33%; "></html>[[step-11|Step 11]]: Initialization and cleanup of MR tasks, performance of combiners.<html></td> 
 +</tr> 
 +</table> 
 +</html>

[ Back to the navigation ] [ Back to the content ]