[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
courses:mapreduce-tutorial:step-11 [2012/01/25 20:57]
straka
courses:mapreduce-tutorial:step-11 [2012/01/31 09:39] (current)
straka Change Perl commandline syntax.
Line 1: Line 1:
-====== MapReduce Tutorial : Initialization and cleanup of MR tasks ======+====== MapReduce Tutorial : Initialization and cleanup of MR tasks, performance of combiners ======
  
 During the mapper or reducer task execution the following steps take place: During the mapper or reducer task execution the following steps take place:
Line 15: Line 15:
 ===== Exercise ===== ===== Exercise =====
  
-Improve the {{:courses:mapreduce-tutorial:step-5-solution1.txt|wc-without-combiner.pl}} script by manually combining the results in the Mapper -- create a hash of word occurrences, fill it during the ''map'' calls and output the (key, value) pairs in ''cleanup'' method.+Improve the {{:courses:mapreduce-tutorial:step-5-solution1.txt|step-11-wc-without-combiner.pl}} script by manually combining the results in the Mapper -- create a hash of word occurrences, populate it during the ''map'' calls without outputting results and finally output all (key, value) pairs in the ''cleanup'' method.
  
-Then measure the improvement.+  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution1.txt' -O 'step-11-wc-without-combiner.pl' 
 +  # NOW EDIT THE FILE 
 +  # $EDITOR step-11-exercise.pl 
 +  rm -rf step-11-out-wout; time perl step-11-wc-without-combiner.pl /home/straka/wiki/cs-text-medium/ step-11-out-wout 
 +  less step-11-out-wout/part-* 
 +       
 +Measure the improvement.
  
-{{:courses:mapreduce-tutorial:step-11-solution.txt|Solution.pl}}+==== Solution ==== 
 +You can also download the solution {{:courses:mapreduce-tutorial:step-11-solution.txt|step-11-wc-with-perl-hash.pl}} and check the correct output.
  
-===== Combiners and Perl API =====+  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-11-solution.txt' -O 'step-11-wc-with-perl-hash.pl' 
 +  # NOW VIEW THE FILE 
 +  # $EDITOR step-11-solution.pl 
 +  rm -rf step-11-out-with-hash; time perl step-11-wc-with-perl-hash.pl /home/straka/wiki/cs-text-medium/ step-11-out-with-hash 
 +  less step-11-out-with-hash/part-*
  
-As you have seen, the combiners are not efficient when using the Perl API. This is a problem of Perl API -- reading and writing the (key, value) pairs is relatively slow and a combiner does not help -- it in fact increases the number of (key, value) pairs that need to be read/written.+ 
 +===== Combiners and Perl API performance ===== 
 + 
 +As you have seen, the combiners are not very efficient when using the Perl API. This is a problem of the Perl API -- reading and writing the (key, value) pairs is relatively slow and a combiner does not help -- it in fact increases the number of (key, value) pairs that need to be read/written.
  
 This is even more obvious with larger input data: This is even more obvious with larger input data:
-^ Script ^ Time to complete on ''/home/straka/wiki/cs-text''+^ Script ^ Time to complete on ''/home/straka/wiki/cs-text'' ^ Commands 
-| {{:courses:mapreduce-tutorial:step-5-solution1.txt|wc-without-combiner.pl}} | 5mins, 4sec | +| {{:courses:mapreduce-tutorial:step-5-solution1.txt|step-11-wc-without-combiner.pl}} | 5mins, 4sec | <html><pre>wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution1.txt' -O 'step-11-wc-without-combiner.pl'<br>rm -rf step-11-out-wout; time perl step-11-wc-without-combiner.pl /home/straka/wiki/cs-text/ step-11-out-wout</pre></html> 
-| {{:courses:mapreduce-tutorial:step-10.txt|wc-with-combiner.pl}} | 5mins, 33sec | +| {{:courses:mapreduce-tutorial:step-10.txt|step-11-wc-with-combiner.pl}} | 5mins, 33sec  | <html><pre>wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-10.txt' -O 'step-11-wc-with-combiner.pl'<br>rm -rf step-11-out-with-combiner; time perl step-11-wc-with-combiner.pl /home/straka/wiki/cs-text/ step-11-out-with-combiner</pre></html>
-| {{:courses:mapreduce-tutorial:step-11-solution.txt|wc-with-perl-hash.pl}} | 2mins, 24sec |+| {{:courses:mapreduce-tutorial:step-11-solution.txt|step-11-wc-with-perl-hash.pl}} | 2mins, 24sec | <html><pre>wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-11-solution.txt' -O 'step-11-wc-with-perl-hash.pl'<br>rm -rf step-11-out-with-perl-hash; time perl step-11-wc-with-perl-hash.pl /home/straka/wiki/cs-text/ step-11-out-with-perl-hash</pre></html>
  
 For comparison, here are times of Java solutions: For comparison, here are times of Java solutions:
-^ Program ^ Time to complete on ''/home/straka/wiki/cs-text''+^ Program ^ Time to complete on ''/home/straka/wiki/cs-text'' ^ Size of map output 
-| Wordcount without combiner | 2mins, 26sec | +| Wordcount without combiner | 2mins, 26sec | 367MB 
-| Wordcount with combiner | 1min, 51sec | +| Wordcount with combiner | 1min, 51sec | 51MB 
-| Wordcount with hash in mapper |  |+| Wordcount with hash in mapper | 1min, 14sec 51MB | 
 +Using the combiner is beneficial, although combining the word occurrences in mapper manually is still faster. 
 + 
 +----
  
 +<html>
 +<table style="width:100%">
 +<tr>
 +<td style="text-align:left; width: 33%; "></html>[[step-10|Step 10]]: Combiners.<html></td>
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
 +<td style="text-align:right; width: 33%; "></html>[[step-12|Step 12]]: Additional output from mappers and reducers.<html></td>
 +</tr>
 +</table>
 +</html>

[ Back to the navigation ] [ Back to the content ]