Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
courses:mapreduce-tutorial:step-6 [2012/01/24 22:40] straka |
courses:mapreduce-tutorial:step-6 [2012/02/06 13:55] (current) straka |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== MapReduce Tutorial : Running on cluster ====== | ====== MapReduce Tutorial : Running on cluster ====== | ||
+ | Probably the most important feature of MapReduce is to run computations distributively. | ||
+ | So far all our Hadoop jobs were executed locally. But all of them can be executed on multiple machines. It suffices to add parameter '' | ||
+ | perl script.pl -c number_of_machines [-w sec_to_wait_after_job_completion] input_directory output_directory | ||
+ | This commands creates a cluster of specified number of machines. Every machine is able to run two mappers and two reducers simultaneously. In order to be able to observe the counters, status and error logs of the computation after it ends, parameter '' | ||
+ | |||
+ | One of the machines in the cluster is a //master//, or a //job tracker//, and it is used to identify the cluster. | ||
+ | |||
+ | In the UFAL environment, | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | When the computation ends and is waiting because of the '' | ||
+ | |||
+ | ===== Web interface ===== | ||
+ | |||
+ | The cluster master provides a web interface on address printed by the '' | ||
+ | |||
+ | The web interface provides a lot of useful information: | ||
+ | * running, failed and successfully completed jobs | ||
+ | * for running job, current progress and counters of the whole job and also of each mapper and reducer is available | ||
+ | * for any job, the counters and outputs of all mappers and reducers | ||
+ | * for any job, all Hadoop settings | ||
+ | |||
+ | |||
+ | |||
+ | ===== Example ===== | ||
+ | |||
+ | Try running the {{: | ||
+ | wget --no-check-certificate ' | ||
+ | rm -rf step-6-out; perl step-6-wordcount.pl -c 1 -w 600 -Dmapred.max.split.size=1000000 / | ||
+ | and explore the web interface. | ||
+ | |||
+ | If you cannot access directly the '' | ||
+ | ssh -N -L 50030: | ||
+ | on your computer to create a tunnel from local port 50030 to machine '' | ||
+ | |||
+ | ---- | ||
+ | |||
+ | < | ||
+ | <table style=" | ||
+ | <tr> | ||
+ | <td style=" | ||
+ | <td style=" | ||
+ | <td style=" | ||
+ | </tr> | ||
+ | </ | ||
+ | </ |