This is an old revision of the document!
Table of Contents
MapReduce Tutorial : Initialization and cleanup of MR tasks
During the mapper or reducer task execution the following steps take place:
- Perl script is executed in the current directory, ie. in the directory where the job was executed / submitted from.
- Mapper/Reducer object is constructed.
- Methodsetup($self, $context)is called on this object. The$contextcan be already used to produce (key, value) pairs or increment counters.
- Methodmaporreduceis called for all input values.
- Methodcleanup($self, $context) is called after all (key, value) pairs of this task are processed. Again, the$contextcan be used to produce (key, value) pairs or increment counters.
- Perl script finishes.
The setup and cleanup methods are very useful for initialization and cleanup of the tasks.
Please note that complex initialization should not be performed during construction of Mapper and Reducer objects, as these are constructed every time the script is executed.
Exercise
Improve the wc-without-combiner.pl script by manually combining the results in the Mapper – create a hash of word occurrences, fill it during the map calls and output the (key, value) pairs in cleanup method.
Then measure the improvement.
