Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-6 [2012/01/24 23:44]
straka
+++ courses:mapreduce-tutorial:step-6 [2012/01/25 00:03]
straka
@@ Line 1: / Line 1: @@
 ====== MapReduce Tutorial : Running on cluster ======
-One of
+Probably the most important feature of MapReduce is to run computations distributively.
+So far all our MR jobs were executed locally. But all of them can be executed on multiple machines. It suffices to add parameter ''-c number_of_machines'' when running them:
+  perl script.pl run -c number_of_machines [-w sec_to_wait_after_job_completion] input_directory output_directory
+This commands creates a cluster of specified number of machines. Every machine is able to run two mappers and two reducers simultaneously. In order to be able to observe the status of the computation after it ends, parameter ''-w sec_to_wait_after_job_completion'' can be used.
+When a distributed MR computations is executed, it submits a job to SGE cluster, with the name of the Perl script. The SGE cluster creates 3 files in current directory:
+  * ''script.pl.c$SGE_JOBID'' -- high-level status of computation. First line contains the name of cluster master.
+  * ''script.pl.o$SGE_JOBID'' -- contains stdout and stderr of the MR job
+  * ''script.pl.po$SGE_JOBID'' -- contains stdout and stderr of the MR cluster
+When the computation ends and is waiting because of the ''-w'' parameter, removing the file ''script.pl.c$SGE_JOBID'' stops the cluster. The cluster can be also stopped by removing its SGE job.

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences