[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
courses:mapreduce-tutorial:step-11 [2012/01/25 19:38]
straka
courses:mapreduce-tutorial:step-11 [2012/01/30 16:45]
dusek
Line 1: Line 1:
-====== MapReduce Tutorial : Initialization and cleanup of MR tasks ======+====== MapReduce Tutorial : Initialization and cleanup of MR tasks, performance of combiners ======
  
 During the mapper or reducer task execution the following steps take place: During the mapper or reducer task execution the following steps take place:
Line 15: Line 15:
 ===== Exercise ===== ===== Exercise =====
  
-Improve the {{:courses:mapreduce-tutorial:step-5-solution1.txt|wc-without-combiner.pl}} script by manually combining the results in the Mapper -- create a hash of word occurrences, fill it during the ''map'' calls and output the (key, value) pairs in ''cleanup'' method.+Improve the {{:courses:mapreduce-tutorial:step-5-solution1.txt|step-11-wc-without-combiner.pl}} script by manually combining the results in the Mapper -- create a hash of word occurrences, populate it during the ''map'' calls without outputting results and finally output all (key, value) pairs in the ''cleanup'' method.
  
-Then measure the improvement.+  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution1.txt' -O 'step-11-wc-without-combiner.pl' 
 +  # NOW EDIT THE FILE 
 +  # $EDITOR step-11-exercise.pl 
 +  rm -rf step-11-out-wout; time perl step-11-wc-without-combiner.pl run /home/straka/wiki/cs-text-medium/ step-11-out-wout 
 +  less step-11-out-wout/part-* 
 +       
 +Measure the improvement.
  
-{{:courses:mapreduce-tutorial:step-11-solution.txt|Solution.pl}}+==== Solution ==== 
 +You can also download the solution {{:courses:mapreduce-tutorial:step-11-solution.txt|step-11-wc-with-perl-hash.pl}} and check the correct output.
  
-===== Combiners and Perl API =====+  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-11-solution.txt' -O 'step-11-wc-with-perl-hash.pl' 
 +  # NOW VIEW THE FILE 
 +  # $EDITOR step-11-solution.pl 
 +  rm -rf step-11-out-with-hash; time perl step-11-wc-with-perl-hash.pl run /home/straka/wiki/cs-text-medium/ step-11-out-with-hash 
 +  less step-11-out-with-hash/part-*
  
-As you have seen, the combiners are not efficient when using the Perl API. This is a problem of Perl -- reading and writing the (key, value) pairs is very slow and a combiner does not help -- it in fact increases the number of (key, value) pairs that need to be read/written.+ 
 +===== Combiners and Perl API performance ===== 
 + 
 +As you have seen, the combiners are not very efficient when using the Perl API. This is a problem of the Perl API -- reading and writing the (key, value) pairs is relatively slow and a combiner does not help -- it in fact increases the number of (key, value) pairs that need to be read/written.
  
 This is even more obvious with larger input data: This is even more obvious with larger input data:
-^ Script ^ Seconds to complete on ''/home/straka/wiki/cs-text''+^ Script ^ Time to complete on ''/home/straka/wiki/cs-text'' ^ Commands 
-| {{:courses:mapreduce-tutorial:step-5-solution1.txt|wc-without-combiner.pl}} | | +| {{:courses:mapreduce-tutorial:step-5-solution1.txt|step-11-wc-without-combiner.pl}} | 5mins, 4sec | <html><pre>wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution1.txt' -O 'step-11-wc-without-combiner.pl'<br>rm -rf step-11-out-wout; time perl step-11-wc-without-combiner.pl run /home/straka/wiki/cs-text/ step-11-out-wout</pre></html> 
-| {{:courses:mapreduce-tutorial:step-10.txt|wc-with-combiner.pl}} | | +| {{:courses:mapreduce-tutorial:step-10.txt|step-11-wc-with-combiner.pl}} | 5mins, 33sec  | <html><pre>wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-10.txt' -O 'step-11-wc-with-combiner.pl'<br>rm -rf step-11-out-with-combiner; time perl step-11-wc-with-combiner.pl run /home/straka/wiki/cs-text/ step-11-out-with-combiner</pre></html>
-| | |+{{:courses:mapreduce-tutorial:step-11-solution.txt|step-11-wc-with-perl-hash.pl}} 2mins, 24sec | <html><pre>wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-11-solution.txt' -O 'step-11-wc-with-perl-hash.pl'<br>rm -rf step-11-out-with-perl-hash; time perl step-11-wc-with-perl-hash.pl run /home/straka/wiki/cs-text/ step-11-out-with-perl-hash</pre></html>
 + 
 + 
 +For comparison, here are times of Java solutions: 
 +^ Program ^ Time to complete on ''/home/straka/wiki/cs-text'' ^ Size of map output ^ 
 +| Wordcount without combiner | 2mins, 26sec | 367MB | 
 +| Wordcount with combiner | 1min, 51sec | 51MB | 
 +| Wordcount with hash in mapper | 1min, 14sec | 51MB | 
 +Using the combiner is beneficial, although combining the word occurrences in mapper manually is still faster. 
 + 
 +----
  
 +<html>
 +<table style="width:100%">
 +<tr>
 +<td style="text-align:left; width: 33%; "></html>[[step-10|Step 10]]: Combiners.<html></td>
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
 +<td style="text-align:right; width: 33%; "></html>[[step-12|Step 12]]: Additional output from mappers and reducers.<html></td>
 +</tr>
 +</table>
 +</html>

[ Back to the navigation ] [ Back to the content ]