[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
courses:mapreduce-tutorial:step-11 [2012/01/25 20:58]
straka
courses:mapreduce-tutorial:step-11 [2012/01/30 16:45]
dusek
Line 1: Line 1:
-====== MapReduce Tutorial : Initialization and cleanup of MR tasks ======+====== MapReduce Tutorial : Initialization and cleanup of MR tasks, performance of combiners ======
  
 During the mapper or reducer task execution the following steps take place: During the mapper or reducer task execution the following steps take place:
Line 15: Line 15:
 ===== Exercise ===== ===== Exercise =====
  
-Improve the {{:courses:mapreduce-tutorial:step-5-solution1.txt|wc-without-combiner.pl}} script by manually combining the results in the Mapper -- create a hash of word occurrences, fill it during the ''map'' calls and output the (key, value) pairs in ''cleanup'' method.+Improve the {{:courses:mapreduce-tutorial:step-5-solution1.txt|step-11-wc-without-combiner.pl}} script by manually combining the results in the Mapper -- create a hash of word occurrences, populate it during the ''map'' calls without outputting results and finally output all (key, value) pairs in the ''cleanup'' method.
  
-Then measure the improvement.+  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution1.txt' -O 'step-11-wc-without-combiner.pl' 
 +  # NOW EDIT THE FILE 
 +  # $EDITOR step-11-exercise.pl 
 +  rm -rf step-11-out-wout; time perl step-11-wc-without-combiner.pl run /home/straka/wiki/cs-text-medium/ step-11-out-wout 
 +  less step-11-out-wout/part-* 
 +       
 +Measure the improvement.
  
-{{:courses:mapreduce-tutorial:step-11-solution.txt|Solution.pl}}+==== Solution ==== 
 +You can also download the solution {{:courses:mapreduce-tutorial:step-11-solution.txt|step-11-wc-with-perl-hash.pl}} and check the correct output.
  
-===== Combiners and Perl API =====+  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-11-solution.txt' -O 'step-11-wc-with-perl-hash.pl' 
 +  # NOW VIEW THE FILE 
 +  # $EDITOR step-11-solution.pl 
 +  rm -rf step-11-out-with-hash; time perl step-11-wc-with-perl-hash.pl run /home/straka/wiki/cs-text-medium/ step-11-out-with-hash 
 +  less step-11-out-with-hash/part-*
  
-As you have seen, the combiners are not efficient when using the Perl API. This is a problem of Perl API -- reading and writing the (key, value) pairs is relatively slow and a combiner does not help -- it in fact increases the number of (key, value) pairs that need to be read/written.+ 
 +===== Combiners and Perl API performance ===== 
 + 
 +As you have seen, the combiners are not very efficient when using the Perl API. This is a problem of the Perl API -- reading and writing the (key, value) pairs is relatively slow and a combiner does not help -- it in fact increases the number of (key, value) pairs that need to be read/written.
  
 This is even more obvious with larger input data: This is even more obvious with larger input data:
-^ Script ^ Time to complete on ''/home/straka/wiki/cs-text''+^ Script ^ Time to complete on ''/home/straka/wiki/cs-text'' ^ Commands 
-| {{:courses:mapreduce-tutorial:step-5-solution1.txt|wc-without-combiner.pl}} | 5mins, 4sec | +| {{:courses:mapreduce-tutorial:step-5-solution1.txt|step-11-wc-without-combiner.pl}} | 5mins, 4sec | <html><pre>wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution1.txt' -O 'step-11-wc-without-combiner.pl'<br>rm -rf step-11-out-wout; time perl step-11-wc-without-combiner.pl run /home/straka/wiki/cs-text/ step-11-out-wout</pre></html> 
-| {{:courses:mapreduce-tutorial:step-10.txt|wc-with-combiner.pl}} | 5mins, 33sec | +| {{:courses:mapreduce-tutorial:step-10.txt|step-11-wc-with-combiner.pl}} | 5mins, 33sec  | <html><pre>wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-10.txt' -O 'step-11-wc-with-combiner.pl'<br>rm -rf step-11-out-with-combiner; time perl step-11-wc-with-combiner.pl run /home/straka/wiki/cs-text/ step-11-out-with-combiner</pre></html>
-| {{:courses:mapreduce-tutorial:step-11-solution.txt|wc-with-perl-hash.pl}} | 2mins, 24sec |+| {{:courses:mapreduce-tutorial:step-11-solution.txt|step-11-wc-with-perl-hash.pl}} | 2mins, 24sec | <html><pre>wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-11-solution.txt' -O 'step-11-wc-with-perl-hash.pl'<br>rm -rf step-11-out-with-perl-hash; time perl step-11-wc-with-perl-hash.pl run /home/straka/wiki/cs-text/ step-11-out-with-perl-hash</pre></html>
  
 For comparison, here are times of Java solutions: For comparison, here are times of Java solutions:
Line 35: Line 50:
 | Wordcount without combiner | 2mins, 26sec | 367MB | | Wordcount without combiner | 2mins, 26sec | 367MB |
 | Wordcount with combiner | 1min, 51sec | 51MB | | Wordcount with combiner | 1min, 51sec | 51MB |
-| Wordcount with hash in mapper |  |+| Wordcount with hash in mapper | 1min, 14sec 51MB | 
 +Using the combiner is beneficial, although combining the word occurrences in mapper manually is still faster. 
 + 
 +----
  
 +<html>
 +<table style="width:100%">
 +<tr>
 +<td style="text-align:left; width: 33%; "></html>[[step-10|Step 10]]: Combiners.<html></td>
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
 +<td style="text-align:right; width: 33%; "></html>[[step-12|Step 12]]: Additional output from mappers and reducers.<html></td>
 +</tr>
 +</table>
 +</html>

[ Back to the navigation ] [ Back to the content ]