Table of Contents

MapReduce Tutorial : Additional output from mappers and reducers

Sometimes it would be useful to create output files manually in reducers – either multiple files are needed per reducer, or a specific file format is desired.

Problem is that Hadoop framework can spawn several task attempts for the same reducer task – either because of speculative execution, or if one reduce attempt is presumed to have crashed, even if it in fact did not.

For these reasons Hadoop creates an output directory for every reduce attempt it makes. If the reducer finishes successfully, the files in this directory are moved to the output directory. Still, user must ensure different reducers produce different filenames, usually by naming the files using the serial number of reducer.

Both these informations are available in Perl API using environmental variables:

Reduce-less jobs

If a MR job runs without reducers, the output of mappers is written to output directory without further processing. In this case, environmental variable HADOOP_WORK_OUTPUT_PATH is present even in a mapper and the files created in this directory are copied to the job output directory.

Exercise

Change the word counting script step-12-exercise.pl to produce results in reducers manually using the mentioned environmental variables, and execute it using four reducers.

wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution1.txt' -O 'step-12-exercise.pl'
# NOW EDIT THE FILE
# $EDITOR step-12-exercise.pl
rm -rf step-12-out-ex; perl step-12-exercise.pl -c 4 -r 4 /home/straka/wiki/cs-text-medium/ step-12-out-ex
less step-12-out-ex/part-*

Solution

You can also download the solution step-12-solution.pl and check the correct output.

wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-12-solution.txt' -O 'step-12-solution.pl'
# NOW VIEW THE FILE
# $EDITOR step-12-solution.pl
rm -rf step-12-out-sol; perl step-12-solution.pl -c 4 -r 4 /home/straka/wiki/cs-text-medium/ step-12-out-sol
less step-12-out-sol/vystup-*

Step 11: Initialization and cleanup of MR tasks, performance of combiners. Overview Step 13: Sorting.