Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
courses:mapreduce-tutorial:step-11 [2012/01/25 20:58] straka |
courses:mapreduce-tutorial:step-11 [2012/01/31 09:39] (current) straka Change Perl commandline syntax. |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== MapReduce Tutorial : Initialization and cleanup of MR tasks ====== | + | ====== MapReduce Tutorial : Initialization and cleanup of MR tasks, performance of combiners |
| During the mapper or reducer task execution the following steps take place: | During the mapper or reducer task execution the following steps take place: | ||
| Line 15: | Line 15: | ||
| ===== Exercise ===== | ===== Exercise ===== | ||
| - | Improve the {{: | + | Improve the {{: |
| - | Then measure | + | wget --no-check-certificate ' |
| + | # NOW EDIT THE FILE | ||
| + | # $EDITOR step-11-exercise.pl | ||
| + | rm -rf step-11-out-wout; | ||
| + | less step-11-out-wout/ | ||
| + | |||
| + | Measure | ||
| - | {{: | + | ==== Solution ==== |
| + | You can also download the solution | ||
| - | ===== Combiners and Perl API ===== | + | wget --no-check-certificate ' |
| + | # NOW VIEW THE FILE | ||
| + | # $EDITOR step-11-solution.pl | ||
| + | rm -rf step-11-out-with-hash; | ||
| + | less step-11-out-with-hash/ | ||
| - | As you have seen, the combiners are not efficient when using the Perl API. This is a problem of Perl API -- reading and writing the (key, value) pairs is relatively slow and a combiner does not help -- it in fact increases the number of (key, value) pairs that need to be read/ | + | |
| + | ===== Combiners and Perl API performance ===== | ||
| + | |||
| + | As you have seen, the combiners are not very efficient when using the Perl API. This is a problem of the Perl API -- reading and writing the (key, value) pairs is relatively slow and a combiner does not help -- it in fact increases the number of (key, value) pairs that need to be read/ | ||
| This is even more obvious with larger input data: | This is even more obvious with larger input data: | ||
| - | ^ Script ^ Time to complete on ''/ | + | ^ Script ^ Time to complete on ''/ |
| - | | {{: | + | | {{: |
| - | | {{: | + | | {{: |
| - | | {{: | + | | {{: |
| For comparison, here are times of Java solutions: | For comparison, here are times of Java solutions: | ||
| Line 35: | Line 50: | ||
| | Wordcount without combiner | 2mins, 26sec | 367MB | | | Wordcount without combiner | 2mins, 26sec | 367MB | | ||
| | Wordcount with combiner | 1min, 51sec | 51MB | | | Wordcount with combiner | 1min, 51sec | 51MB | | ||
| - | | Wordcount with hash in mapper | | | + | | Wordcount with hash in mapper | 1min, 14sec | 51MB | |
| + | Using the combiner is beneficial, although combining the word occurrences in mapper manually is still faster. | ||
| + | |||
| + | ---- | ||
| + | < | ||
| + | <table style=" | ||
| + | <tr> | ||
| + | <td style=" | ||
| + | <td style=" | ||
| + | <td style=" | ||
| + | </tr> | ||
| + | </ | ||
| + | </ | ||
