Both sides previous revision
Previous revision
Next revision
|
Previous revision
|
courses:mapreduce-tutorial:step-11 [2012/01/25 20:53] straka |
courses:mapreduce-tutorial:step-11 [2012/01/31 09:39] (current) straka Change Perl commandline syntax. |
====== MapReduce Tutorial : Initialization and cleanup of MR tasks ====== | ====== MapReduce Tutorial : Initialization and cleanup of MR tasks, performance of combiners ====== |
| |
During the mapper or reducer task execution the following steps take place: | During the mapper or reducer task execution the following steps take place: |
===== Exercise ===== | ===== Exercise ===== |
| |
Improve the {{:courses:mapreduce-tutorial:step-5-solution1.txt|wc-without-combiner.pl}} script by manually combining the results in the Mapper -- create a hash of word occurrences, fill it during the ''map'' calls and output the (key, value) pairs in ''cleanup'' method. | Improve the {{:courses:mapreduce-tutorial:step-5-solution1.txt|step-11-wc-without-combiner.pl}} script by manually combining the results in the Mapper -- create a hash of word occurrences, populate it during the ''map'' calls without outputting results and finally output all (key, value) pairs in the ''cleanup'' method. |
| |
Then measure the improvement. | wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution1.txt' -O 'step-11-wc-without-combiner.pl' |
| # NOW EDIT THE FILE |
| # $EDITOR step-11-exercise.pl |
| rm -rf step-11-out-wout; time perl step-11-wc-without-combiner.pl /home/straka/wiki/cs-text-medium/ step-11-out-wout |
| less step-11-out-wout/part-* |
| |
| Measure the improvement. |
| |
{{:courses:mapreduce-tutorial:step-11-solution.txt|Solution.pl}} | ==== Solution ==== |
| You can also download the solution {{:courses:mapreduce-tutorial:step-11-solution.txt|step-11-wc-with-perl-hash.pl}} and check the correct output. |
| |
===== Combiners and Perl API ===== | wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-11-solution.txt' -O 'step-11-wc-with-perl-hash.pl' |
| # NOW VIEW THE FILE |
| # $EDITOR step-11-solution.pl |
| rm -rf step-11-out-with-hash; time perl step-11-wc-with-perl-hash.pl /home/straka/wiki/cs-text-medium/ step-11-out-with-hash |
| less step-11-out-with-hash/part-* |
| |
As you have seen, the combiners are not efficient when using the Perl API. This is a problem of Perl API -- reading and writing the (key, value) pairs is relatively slow and a combiner does not help -- it in fact increases the number of (key, value) pairs that need to be read/written. | |
| ===== Combiners and Perl API performance ===== |
| |
| As you have seen, the combiners are not very efficient when using the Perl API. This is a problem of the Perl API -- reading and writing the (key, value) pairs is relatively slow and a combiner does not help -- it in fact increases the number of (key, value) pairs that need to be read/written. |
| |
This is even more obvious with larger input data: | This is even more obvious with larger input data: |
^ Script ^ Time to complete on ''/home/straka/wiki/cs-text'' ^ | ^ Script ^ Time to complete on ''/home/straka/wiki/cs-text'' ^ Commands ^ |
| {{:courses:mapreduce-tutorial:step-5-solution1.txt|wc-without-combiner.pl}} | 5mins, 4sec | | | {{:courses:mapreduce-tutorial:step-5-solution1.txt|step-11-wc-without-combiner.pl}} | 5mins, 4sec | <html><pre>wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution1.txt' -O 'step-11-wc-without-combiner.pl'<br>rm -rf step-11-out-wout; time perl step-11-wc-without-combiner.pl /home/straka/wiki/cs-text/ step-11-out-wout</pre></html> | |
| {{:courses:mapreduce-tutorial:step-10.txt|wc-with-combiner.pl}} | 5mins, 33sec | | | {{:courses:mapreduce-tutorial:step-10.txt|step-11-wc-with-combiner.pl}} | 5mins, 33sec | <html><pre>wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-10.txt' -O 'step-11-wc-with-combiner.pl'<br>rm -rf step-11-out-with-combiner; time perl step-11-wc-with-combiner.pl /home/straka/wiki/cs-text/ step-11-out-with-combiner</pre></html>| |
| {{:courses:mapreduce-tutorial:step-11-solution.txt|wc-with-perl-hash.pl}} | 2mins, 24sec | | | {{:courses:mapreduce-tutorial:step-11-solution.txt|step-11-wc-with-perl-hash.pl}} | 2mins, 24sec | <html><pre>wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-11-solution.txt' -O 'step-11-wc-with-perl-hash.pl'<br>rm -rf step-11-out-with-perl-hash; time perl step-11-wc-with-perl-hash.pl /home/straka/wiki/cs-text/ step-11-out-with-perl-hash</pre></html>| |
| |
| |
| For comparison, here are times of Java solutions: |
| ^ Program ^ Time to complete on ''/home/straka/wiki/cs-text'' ^ Size of map output ^ |
| | Wordcount without combiner | 2mins, 26sec | 367MB | |
| | Wordcount with combiner | 1min, 51sec | 51MB | |
| | Wordcount with hash in mapper | 1min, 14sec | 51MB | |
| Using the combiner is beneficial, although combining the word occurrences in mapper manually is still faster. |
| |
| ---- |
| |
| <html> |
| <table style="width:100%"> |
| <tr> |
| <td style="text-align:left; width: 33%; "></html>[[step-10|Step 10]]: Combiners.<html></td> |
| <td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> |
| <td style="text-align:right; width: 33%; "></html>[[step-12|Step 12]]: Additional output from mappers and reducers.<html></td> |
| </tr> |
| </table> |
| </html> |