This is an old revision of the document!

MapReduce Tutorial : Running on cluster

Probably the most important feature of MapReduce is to run computations distributively.

So far all our MR jobs were executed locally. But all of them can be executed on multiple machines. It suffices to add parameter -c number_of_machines when running them:

perl script.pl run -c number_of_machines [-w sec_to_wait_after_job_completion] input_directory output_directory

This commands creates a cluster of specified number of machines. Every machine is able to run two mappers and two reducers simultaneously. In order to be able to observe the status of the computation, parameter -w sec_to_wait_after_job_completion can be used.

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

MapReduce Tutorial : Running on cluster