[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-6 [2012/01/24 23:51]
straka
courses:mapreduce-tutorial:step-6 [2012/01/25 00:28]
straka
Line 5: Line 5:
 So far all our MR jobs were executed locally. But all of them can be executed on multiple machines. It suffices to add parameter ''-c number_of_machines'' when running them: So far all our MR jobs were executed locally. But all of them can be executed on multiple machines. It suffices to add parameter ''-c number_of_machines'' when running them:
   perl script.pl run -c number_of_machines [-w sec_to_wait_after_job_completion] input_directory output_directory   perl script.pl run -c number_of_machines [-w sec_to_wait_after_job_completion] input_directory output_directory
-This commands creates a cluster of specified number of machines. Every machine is able to run two mappers and two reducers simultaneously. In order to be able to observe the status of the computation, parameter ''-w sec_to_wait_after_job_completion'' can be used.+This commands creates a cluster of specified number of machines. Every machine is able to run two mappers and two reducers simultaneously. In order to be able to observe the status of the computation after it ends, parameter ''-w sec_to_wait_after_job_completion'' can be used.
  
 +When a distributed MR computations is executed, it submits a job to SGE cluster, with the name of the Perl script. The SGE cluster creates 3 files in current directory:
 +  * ''script.pl.c$SGE_JOBID'' -- high-level status of computation
 +  * ''script.pl.o$SGE_JOBID'' -- contains stdout and stderr of the MR job
 +  * ''script.pl.po$SGE_JOBID'' -- contains stdout and stderr of the MR cluster
 +When the computation ends and is waiting because of the ''-w'' parameter, removing the file ''script.pl.c$SGE_JOBID'' stops the cluster. The cluster can be also stopped by removing its SGE job.
 +
 +===== Web interface =====
 +
 +The cluster master provides a web interface on port 50030 (the port may change in the future). The cluster master address can be found at the first line of ''script.pl.c$SGE_JOBID'', or using ''qstat -j $SGE_JOBID'' (context variable ''hdfs_jobtracker_admin'').
 +
 +The web interface provides a lot of useful informations:
 +  * running, failed and successfully completed jobs
 +  * for running job, current progress and counters of the whole job and also of each mapper and reducer is available
 +  * for any job, the counters and outputs of all mappers and reducers
 +  * for any job, all Hadoop settings
 +
 +===== Example =====
 +
 +Try running the {{:courses:mapreduce-tutorial:step-6.txt|wordcount.pl}} using
 +  perl wordcount.pl -c 1 -w 300 -Dmapred.max.split.size=1000000 /home/straka/wiki/cs-text-medium some_output_directory
 +and explore the web interface.
 +
 +If you cannot access directly the ''*.ufal.hide.ms.mff.cuni.cz'', you can use for example
 +  ssh -N -L 50030:pandora3:50030 geri
 +to create a tunnel from local port 50030 to machine ''pandora3:50030''

[ Back to the navigation ] [ Back to the content ]