[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
courses:mapreduce-tutorial:step-6 [2012/01/28 13:30]
majlis Little explanation was added.
courses:mapreduce-tutorial:step-6 [2012/02/06 13:55] (current)
straka
Line 4: Line 4:
  
 So far all our Hadoop jobs were executed locally. But all of them can be executed on multiple machines. It suffices to add parameter ''-c number_of_machines'' when running them: So far all our Hadoop jobs were executed locally. But all of them can be executed on multiple machines. It suffices to add parameter ''-c number_of_machines'' when running them:
-  perl script.pl run -c number_of_machines [-w sec_to_wait_after_job_completion] input_directory output_directory +  perl script.pl -c number_of_machines [-w sec_to_wait_after_job_completion] input_directory output_directory 
-This commands creates a cluster of specified number of machines. Every machine is able to run two mappers and two reducers simultaneously. In order to be able to observe the status of the computation after it ends, parameter ''-w sec_to_wait_after_job_completion'' can be used.+This commands creates a cluster of specified number of machines. Every machine is able to run two mappers and two reducers simultaneously. In order to be able to observe the counters, status and error logs of the computation after it ends, parameter ''-w sec_to_wait_after_job_completion'' can be used -- when it is used, after the job finishes (successfully or not) the cluster waits for specified time before shutting down.
  
 One of the machines in the cluster is a //master//, or a //job tracker//, and it is used to identify the cluster. One of the machines in the cluster is a //master//, or a //job tracker//, and it is used to identify the cluster.
Line 17: Line 17:
 ===== Web interface ===== ===== Web interface =====
  
-The cluster master provides a web interface on port 50030 (the port may change in the future). The cluster master address can be found at the first line of ''script.pl.c$SGE_JOBID'', or using ''qstat -j $SGE_JOBID'' (context variable ''hdfs_jobtracker_admin'').+The cluster master provides a web interface on address printed by the ''hadoop-cluster'' script. The address is also present on the second line of ''script.pl.c$SGE_JOBID'', or using ''qstat -j $SGE_JOBID''context variable ''hdfs_jobtracker_admin''.
  
 The web interface provides a lot of useful information: The web interface provides a lot of useful information:
Line 24: Line 24:
   * for any job, the counters and outputs of all mappers and reducers   * for any job, the counters and outputs of all mappers and reducers
   * for any job, all Hadoop settings   * for any job, all Hadoop settings
 +
 +
  
 ===== Example ===== ===== Example =====
Line 29: Line 31:
 Try running the {{:courses:mapreduce-tutorial:step-6.txt|step-6-wordcount.pl}} using Try running the {{:courses:mapreduce-tutorial:step-6.txt|step-6-wordcount.pl}} using
   wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-6.txt' -O 'step-6-wordcount.pl'   wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-6.txt' -O 'step-6-wordcount.pl'
-  rm -rf step-6-out; perl step-6-wordcount.pl run -c 1 -w 600 -Dmapred.max.split.size=1000000 /home/straka/wiki/cs-text-medium step-6-out+  rm -rf step-6-out; perl step-6-wordcount.pl -c 1 -w 600 -Dmapred.max.split.size=1000000 /home/straka/wiki/cs-text-medium step-6-out
 and explore the web interface. and explore the web interface.
  
Line 36: Line 38:
 on your computer to create a tunnel from local port 50030 to machine ''pandora3:50030''. Replace **''pandora3''** by your cluster_master, but leave the hostname **''geri.ms.mff.cuni.cz''** unmodified. Now you can access the web interface on the URL [[http://localhost:50030]] on your computer to create a tunnel from local port 50030 to machine ''pandora3:50030''. Replace **''pandora3''** by your cluster_master, but leave the hostname **''geri.ms.mff.cuni.cz''** unmodified. Now you can access the web interface on the URL [[http://localhost:50030]]
  
 +----
  
 +<html>
 +<table style="width:100%">
 +<tr>
 +<td style="text-align:left; width: 33%; "></html>[[step-5|Step 5]]: Basic reducer.<html></td>
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
 +<td style="text-align:right; width: 33%; "></html>[[step-7|Step 7]]: Dynamic Hadoop cluster for several computations.<html></td>
 +</tr>
 +</table>
 +</html>

[ Back to the navigation ] [ Back to the content ]