[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
courses:mapreduce-tutorial:step-11 [2012/01/25 22:14]
straka
courses:mapreduce-tutorial:step-11 [2012/01/30 16:45]
dusek
Line 15: Line 15:
 ===== Exercise ===== ===== Exercise =====
  
-Improve the {{:courses:mapreduce-tutorial:step-5-solution1.txt|wc-without-combiner.pl}} script by manually combining the results in the Mapper -- create a hash of word occurrences, populate it during the ''map'' calls without outputting results and finally output all (key, value) pairs in the ''cleanup'' method.+Improve the {{:courses:mapreduce-tutorial:step-5-solution1.txt|step-11-wc-without-combiner.pl}} script by manually combining the results in the Mapper -- create a hash of word occurrences, populate it during the ''map'' calls without outputting results and finally output all (key, value) pairs in the ''cleanup'' method.
  
 +  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution1.txt' -O 'step-11-wc-without-combiner.pl'
 +  # NOW EDIT THE FILE
 +  # $EDITOR step-11-exercise.pl
 +  rm -rf step-11-out-wout; time perl step-11-wc-without-combiner.pl run /home/straka/wiki/cs-text-medium/ step-11-out-wout
 +  less step-11-out-wout/part-*
 +      
 Measure the improvement. Measure the improvement.
  
-{{:courses:mapreduce-tutorial:step-11-solution.txt|Solution.pl}}+==== Solution ==== 
 +You can also download the solution {{:courses:mapreduce-tutorial:step-11-solution.txt|step-11-wc-with-perl-hash.pl}} and check the correct output. 
 + 
 +  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-11-solution.txt' -O 'step-11-wc-with-perl-hash.pl' 
 +  # NOW VIEW THE FILE 
 +  # $EDITOR step-11-solution.pl 
 +  rm -rf step-11-out-with-hash; time perl step-11-wc-with-perl-hash.pl run /home/straka/wiki/cs-text-medium/ step-11-out-with-hash 
 +  less step-11-out-with-hash/part-* 
  
 ===== Combiners and Perl API performance ===== ===== Combiners and Perl API performance =====
Line 26: Line 40:
  
 This is even more obvious with larger input data: This is even more obvious with larger input data:
-^ Script ^ Time to complete on ''/home/straka/wiki/cs-text''+^ Script ^ Time to complete on ''/home/straka/wiki/cs-text'' ^ Commands 
-| {{:courses:mapreduce-tutorial:step-5-solution1.txt|wc-without-combiner.pl}} | 5mins, 4sec | +| {{:courses:mapreduce-tutorial:step-5-solution1.txt|step-11-wc-without-combiner.pl}} | 5mins, 4sec | <html><pre>wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution1.txt' -O 'step-11-wc-without-combiner.pl'<br>rm -rf step-11-out-wout; time perl step-11-wc-without-combiner.pl run /home/straka/wiki/cs-text/ step-11-out-wout</pre></html> 
-| {{:courses:mapreduce-tutorial:step-10.txt|wc-with-combiner.pl}} | 5mins, 33sec | +| {{:courses:mapreduce-tutorial:step-10.txt|step-11-wc-with-combiner.pl}} | 5mins, 33sec  | <html><pre>wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-10.txt' -O 'step-11-wc-with-combiner.pl'<br>rm -rf step-11-out-with-combiner; time perl step-11-wc-with-combiner.pl run /home/straka/wiki/cs-text/ step-11-out-with-combiner</pre></html>
-| {{:courses:mapreduce-tutorial:step-11-solution.txt|wc-with-perl-hash.pl}} | 2mins, 24sec |+| {{:courses:mapreduce-tutorial:step-11-solution.txt|step-11-wc-with-perl-hash.pl}} | 2mins, 24sec | <html><pre>wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-11-solution.txt' -O 'step-11-wc-with-perl-hash.pl'<br>rm -rf step-11-out-with-perl-hash; time perl step-11-wc-with-perl-hash.pl run /home/straka/wiki/cs-text/ step-11-out-with-perl-hash</pre></html>
  
 For comparison, here are times of Java solutions: For comparison, here are times of Java solutions:
Line 36: Line 51:
 | Wordcount with combiner | 1min, 51sec | 51MB | | Wordcount with combiner | 1min, 51sec | 51MB |
 | Wordcount with hash in mapper | 1min, 14sec | 51MB | | Wordcount with hash in mapper | 1min, 14sec | 51MB |
-Using the combiner is beneficial, although manually combining the word occurrences in mapper manually is still faster.+Using the combiner is beneficial, although combining the word occurrences in mapper manually is still faster. 
 + 
 +----
  
 +<html>
 +<table style="width:100%">
 +<tr>
 +<td style="text-align:left; width: 33%; "></html>[[step-10|Step 10]]: Combiners.<html></td>
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
 +<td style="text-align:right; width: 33%; "></html>[[step-12|Step 12]]: Additional output from mappers and reducers.<html></td>
 +</tr>
 +</table>
 +</html>

[ Back to the navigation ] [ Back to the content ]