Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
courses:mapreduce-tutorial:step-6 [2012/01/27 16:57] straka |
courses:mapreduce-tutorial:step-6 [2012/01/31 09:41] straka Change Perl commandline syntax. |
||
---|---|---|---|
Line 4: | Line 4: | ||
So far all our Hadoop jobs were executed locally. But all of them can be executed on multiple machines. It suffices to add parameter '' | So far all our Hadoop jobs were executed locally. But all of them can be executed on multiple machines. It suffices to add parameter '' | ||
- | perl script.pl | + | perl script.pl -c number_of_machines [-w sec_to_wait_after_job_completion] input_directory output_directory |
- | This commands creates a cluster of specified number of machines. Every machine is able to run two mappers and two reducers simultaneously. In order to be able to observe the status of the computation after it ends, parameter '' | + | This commands creates a cluster of specified number of machines. Every machine is able to run two mappers and two reducers simultaneously. In order to be able to observe the counters, |
One of the machines in the cluster is a //master//, or a //job tracker//, and it is used to identify the cluster. | One of the machines in the cluster is a //master//, or a //job tracker//, and it is used to identify the cluster. | ||
Line 24: | Line 24: | ||
* for any job, the counters and outputs of all mappers and reducers | * for any job, the counters and outputs of all mappers and reducers | ||
* for any job, all Hadoop settings | * for any job, all Hadoop settings | ||
+ | |||
+ | ===== If things go wrong ===== | ||
+ | |||
+ | If the Hadoop job crashes, there are several ways you can do: | ||
+ | * run the computation locally in single threaded mode. This is more useful for Hadoop jobs written in Java, because then you can use a debugger. When using Perl API, new subprocess are created for Perl tasks anyway. | ||
+ | * use standard error output for log messages. You can access the stderr logs of all Hadoop tasks using the web interface. | ||
===== Example ===== | ===== Example ===== | ||
- | Try running the {{: | + | Try running the {{: |
- | perl wordcount.pl | + | |
+ | rm -rf step-6-out; | ||
and explore the web interface. | and explore the web interface. | ||
If you cannot access directly the '' | If you cannot access directly the '' | ||
- | ssh -N -L 50030: | + | ssh -N -L 50030: |
- | to create a tunnel from local port 50030 to machine '' | + | on your computer |
+ | |||
+ | ---- | ||
+ | < | ||
+ | <table style=" | ||
+ | <tr> | ||
+ | <td style=" | ||
+ | <td style=" | ||
+ | <td style=" | ||
+ | </tr> | ||
+ | </ | ||
+ | </ |