[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
courses:mapreduce-tutorial:step-10 [2012/01/25 15:46]
straka vytvořeno
courses:mapreduce-tutorial:step-10 [2012/01/31 09:38] (current)
straka Change Perl commandline syntax.
Line 1: Line 1:
-====== MapReduce Tutorial :  ======+====== MapReduce Tutorial : Combiners ====== 
 + 
 +Sometimes the reduce is a binary operation, which is associative and commutative, e.g. ''+''. In that case it is inefficient to produce all the (key, value) pairs in the mappers and send them through the network. 
 + 
 +Instead, reducer can be executed right after the map, on //some portion// of values belonging to the same key. Only the aggregated results are then sent through the network. 
 + 
 +A Hadoop job can have such locally executed reducer, called a //combiner//. If a combiner is specified, the output of a mapper is processed by a combiner before sending the pairs to reducer. The combiner may be invoked 0, 1 or multiple times, usually when the data are written to disk. 
 + 
 +Typically, the combiner is the same as the reducer of a MR job. 
 + 
 +<file perl> 
 +package My::Mapper; 
 +... 
 + 
 +package My::Reducer; 
 +... 
 + 
 +package main; 
 +use Hadoop::Runner; 
 + 
 +my $runner = Hadoop::Runner->new( 
 +  mapper => My::Mapper->new(), 
 +  combiner => My::Reducer->new(), # Specify the combiner. 
 +  reducer => My::Reducer->new(), 
 +  input_format => 'KeyValueTextInputFormat'); 
 +... 
 +</file> 
 + 
 +===== Exercise ===== 
 + 
 +Compare the effect of adding the combiner to a MR job which counts occurrences of words in ''/home/straka/wiki/cs-text-medium'': {{:courses:mapreduce-tutorial:step-5-solution1.txt|step-10-wc-without-combiner.pl}} and {{:courses:mapreduce-tutorial:step-10.txt|step-10-wc-with-combiner.pl}}. 
 +  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution1.txt' -O 'step-10-wc-without-combiner.pl' 
 +  # NOW VIEW THE FILE 
 +  # $EDITOR step-10-wc-without-combiner.pl 
 +  rm -rf step-10-out-wout; time perl step-10-wc-without-combiner.pl /home/straka/wiki/cs-text-medium/ step-10-out-wout 
 +   
 +  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-10.txt' -O 'step-10-wc-with-combiner.pl' 
 +  # NOW VIEW THE FILE 
 +  # $EDITOR step-10-wc-with-combiner.pl 
 +  rm -rf step-10-out-with; time perl step-10-wc-with-combiner.pl /home/straka/wiki/cs-text-medium/ step-10-out-with 
 + 
 +How would you explain the results? 
 + 
 +---- 
 + 
 +<html> 
 +<table style="width:100%"> 
 +<tr> 
 +<td style="text-align:left; width: 33%; "></html>[[step-9|Step 9]]: Hadoop properties.<html></td> 
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> 
 +<td style="text-align:right; width: 33%; "></html>[[step-11|Step 11]]: Initialization and cleanup of MR tasks, performance of combiners.<html></td> 
 +</tr> 
 +</table> 
 +</html>

[ Back to the navigation ] [ Back to the content ]