[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

This is an old revision of the document!


Table of Contents

MapReduce Tutorial : Initialization and cleanup of MR tasks

During the mapper or reducer task execution the following steps take place:

The setup and cleanup methods are very useful for initialization and cleanup of the tasks.

Please note that complex initialization should not be performed during construction of Mapper and Reducer objects, as these are constructed every time the script is executed.

Exercise

Improve the wc-without-combiner.pl script by manually combining the results in the Mapper – create a hash of word occurrences, fill it during the map calls and output the (key, value) pairs in cleanup method.

Then measure the improvement.

Solution.pl

Combiners and Perl API

As you have seen, the combiners are not efficient when using the Perl API. This is a problem of Perl – reading and writing the (key, value) pairs is very slow and a combiner does not help – it in fact increases the number of (key, value) pairs that need to be read/written.

This is even more obvious with larger input data:

Script Seconds to complete on /home/straka/wiki/cs-text
wc-without-combiner.pl
wc-with-combiner.pl
wc-with-perl-hash.pl

[ Back to the navigation ] [ Back to the content ]