Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
courses:mapreduce-tutorial:step-6 [2012/01/24 23:51] straka |
courses:mapreduce-tutorial:step-6 [2012/01/27 16:57] straka |
||
---|---|---|---|
Line 3: | Line 3: | ||
Probably the most important feature of MapReduce is to run computations distributively. | Probably the most important feature of MapReduce is to run computations distributively. | ||
- | So far all our MR jobs were executed locally. But all of them can be executed on multiple machines. It suffices to add parameter '' | + | So far all our Hadoop |
perl script.pl run -c number_of_machines [-w sec_to_wait_after_job_completion] input_directory output_directory | perl script.pl run -c number_of_machines [-w sec_to_wait_after_job_completion] input_directory output_directory | ||
- | This commands creates a cluster of specified number of machines. Every machine is able to run two mappers and two reducers simultaneously. In order to be able to observe the status of the computation, | + | This commands creates a cluster of specified number of machines. Every machine is able to run two mappers and two reducers simultaneously. In order to be able to observe the status of the computation |
+ | |||
+ | One of the machines in the cluster is a //master//, or a //job tracker//, and it is used to identify the cluster. | ||
+ | |||
+ | In the UFAL environment, | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | When the computation ends and is waiting because of the '' | ||
+ | |||
+ | ===== Web interface ===== | ||
+ | |||
+ | The cluster master provides a web interface on port 50030 (the port may change in the future). The cluster master address can be found at the first line of '' | ||
+ | |||
+ | The web interface provides a lot of useful information: | ||
+ | * running, failed and successfully completed jobs | ||
+ | * for running job, current progress and counters of the whole job and also of each mapper and reducer is available | ||
+ | * for any job, the counters and outputs of all mappers and reducers | ||
+ | * for any job, all Hadoop settings | ||
+ | |||
+ | ===== Example ===== | ||
+ | |||
+ | Try running the {{: | ||
+ | perl wordcount.pl run -c 1 -w 600 -Dmapred.max.split.size=1000000 / | ||
+ | and explore the web interface. | ||
+ | |||
+ | If you cannot access directly the '' | ||
+ | ssh -N -L 50030: | ||
+ | to create a tunnel from local port 50030 to machine '' | ||