[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
courses:mapreduce-tutorial:step-7 [2012/01/26 23:11]
straka
courses:mapreduce-tutorial:step-7 [2012/01/31 12:43]
straka
Line 11: Line 11:
 ===== Using a running cluster ===== ===== Using a running cluster =====
 Running cluster is identified by its master. When running a Hadoop job using Perl API, existing cluster can be used by Running cluster is identified by its master. When running a Hadoop job using Perl API, existing cluster can be used by
-  perl script.pl run -jt cluster_master:9001 ...+  perl script.pl -jt cluster_master:9001 ..
 + 
 +===== Running Hadoop jobs from now on ===== 
 + 
 +From now on, it is best to run MR jobs using a one-machine cluster -- create a one-machine cluster using ''hadoop-cluster'' for 3h (10800s) and run jobs using ''-jt cluster_master''. Running the scripts locally without any cluster has several disadvantages, most notably having only one reducer per job
  
 ===== Example ===== ===== Example =====
  
-Try running the same script {{:courses:mapreduce-tutorial:step-6.txt|wordcount.pl}} as in the last step, this time by creating the cluster and submitting the job to it:+Try running the same script {{:courses:mapreduce-tutorial:step-6.txt|step-7-wordcount.pl}} as in the last step, this time by creating the cluster and submitting the job to it: 
 +  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-6.txt' -O 'step-7-wordcount.pl'
   /net/projects/hadoop/bin/hadoop-cluster -c 1 -w 600   /net/projects/hadoop/bin/hadoop-cluster -c 1 -w 600
-  perl wordcount.pl run -jt cluster_master:9001 -Dmapred.max.split.size=1000000 /home/straka/wiki/cs-text-medium some_output_directory+  # NOW VIEW THE FILE 
 +  # $EDITOR step-7-wordcount.pl 
 +  rm -rf step-7-out-sol; perl step-7-wordcount.pl -jt cluster_master:9001 -Dmapred.max.split.size=1000000 /home/straka/wiki/cs-text-medium step-7-out-sol 
 +  less less step-7-out-sol/part-* 
 +Remarks: 
 +  * The reducers seem to start running before the mappers finish. In the web interface, the running time of reducers is divided into thirds: 
 +    * during the first 33%, the mapper outputs are copied to the machine where reducer runs. 
 +    * during the second 33%, the (key, value) pairs are sorted. 
 +    * during the last 33%, the user-defined reducer runs. 
 + 
 +---- 
 + 
 +<html> 
 +<table style="width:100%"> 
 +<tr> 
 +<td style="text-align:left; width: 33%; "></html>[[step-6|Step 6]]: Running on cluster.<html></td> 
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> 
 +<td style="text-align:right; width: 33%; "></html>[[step-8|Step 8]]: Multiple mappers, reducers and partitioning.<html></td> 
 +</tr> 
 +</table> 
 +</html>
  

[ Back to the navigation ] [ Back to the content ]