[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

MapReduce Tutorial : Combiners

Sometimes the reduce is a binary operation, which is associative and commutative, e.g. +. In that case it is inefficient to produce all the (key, value) pairs in the mappers and send them through the network.

Instead, reducer can be executed right after the map, on some portion of values belonging to the same key. Only the aggregated results are then sent through the network.

A Hadoop job can have such locally executed reducer, called a combiner. If a combiner is specified, the output of a mapper is processed by a combiner before sending the pairs to reducer. The combiner may be invoked 0, 1 or multiple times, usually when the data are written to disk.

Typically, the combiner is the same as the reducer of a MR job.

package My::Mapper;
...
 
package My::Reducer;
...
 
package main;
use Hadoop::Runner;
 
my $runner = Hadoop::Runner->new(
  mapper => My::Mapper->new(),
  combiner => My::Reducer->new(), # Specify the combiner.
  reducer => My::Reducer->new(),
  input_format => 'KeyValueTextInputFormat');
...

Exercise

Compare the effect of adding the combiner to a MR job which counts occurrences of words in /home/straka/wiki/cs-text-medium: step-10-wc-without-combiner.pl and step-10-wc-with-combiner.pl.

wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution1.txt' -O 'step-10-wc-without-combiner.pl'
# NOW VIEW THE FILE
# $EDITOR step-10-wc-without-combiner.pl
rm -rf step-10-out-wout; time perl step-10-wc-without-combiner.pl /home/straka/wiki/cs-text-medium/ step-10-out-wout

wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-10.txt' -O 'step-10-wc-with-combiner.pl'
# NOW VIEW THE FILE
# $EDITOR step-10-wc-with-combiner.pl
rm -rf step-10-out-with; time perl step-10-wc-with-combiner.pl /home/straka/wiki/cs-text-medium/ step-10-out-with

How would you explain the results?


Step 9: Hadoop properties. Overview Step 11: Initialization and cleanup of MR tasks, performance of combiners.


[ Back to the navigation ] [ Back to the content ]