Both sides previous revision
Previous revision
Next revision
|
Previous revision
|
courses:mapreduce-tutorial:step-9 [2012/01/25 16:17] straka |
courses:mapreduce-tutorial:step-9 [2012/01/31 09:42] (current) straka Change Perl commandline syntax. |
We have controlled the Hadoop jobs using the Perl API so far, which is quite limited. | We have controlled the Hadoop jobs using the Perl API so far, which is quite limited. |
| |
The Hadoop itself uses many configuration options. Every option has a (dot-separated) name and a value and can be set on the command line using ''-Dname=value'' syntax: | The Hadoop itself uses many configuration options. The options can be set on command line using the ''-Dname=value'' syntax: |
perl script.pl run [-jt cluster_master | -c cluster_size [-w sec_to_wait]] [-r number_of_reducers] [Hadoop options] input_path output_path | perl script.pl [-jt cluster_master | -c cluster_size [-w sec_to_wait]] [-r number_of_reducers] [-Dname=value -Dname=value ...] input output_path |
Mind that the order of options matters -- the ''-jt'', ''-c'', ''-w'' and ''-r'' must precede Hadoop options to be recognized. | Mind that the order of options matters -- the ''-jt'', ''-c'', ''-w'' and ''-r'' must precede Hadoop options to be recognized. |
| |
Every Hadoop option has a read-only default. These are overridden by cluster specific options. Lastly, all of these are overriden by job specific options given on the command line (or set using the Java API). | Every Hadoop option has a read-only default. These are overridden by cluster specific options. Lastly, all of these are overridden by job specific options given on the command line (or set using the Java API). |
| |
==== Mapping of Perl options to Hadoop ==== | ===== A brief list of Hadoop options ===== |
| ^ Hadoop option ^ Default value ^ Description ^ |
| | ''mapred.job.tracker'' | ? | Cluster master. | |
| | ''mapred.reduce.tasks'' | 1 | Number of reducers. | |
| | ''mapred.min.split.size'' | 1 | Minimum size of file split in bytes. | |
| | ''mapred.max.split.size'' | 2%%^%%63-1 | Minimum size of file split in bytes. | |
| | ''mapred.map.tasks.speculative.execution'' | true | If true, then multiple instances of some map tasks may be executed in parallel. | |
| | ''mapred.reduce.tasks.speculative.execution'' | true | If true, then multiple instances of some reduce tasks may be executed in parallel. | |
| | ''mapred.compress.map.output'' | false | Should the outputs of the maps be compressed before being sent across the network. Uses SequenceFile compression. | |
| |
| A more complete list (but not exhaustive) can be found [[http://hadoop.apache.org/common/docs/r1.0.0/mapred-default.html|here]]. |
| |
| ===== Mapping of Perl options to Hadoop ===== |
^ Perl options ^ Hadoop options ^ | ^ Perl options ^ Hadoop options ^ |
| no options \\ (running locally) | ''-Dmapred.job.tracker=local'' \\ ''-Dmapred.local.dir=hadoop-localrunner-tmp'' \\ ''-Dhadoop.tmp.dir=hadoop-localrunner-tmp'' | | | no options \\ (running locally) | ''-Dmapred.job.tracker=local'' \\ ''-Dmapred.local.dir=hadoop-localrunner-tmp'' \\ ''-Dhadoop.tmp.dir=hadoop-localrunner-tmp'' | |
| ''-jt cluster_master'' | ''-Dmapred.job.tracker=cluster_master'' | | | ''-jt cluster_master'' | ''-Dmapred.job.tracker=cluster_master'' | |
| ''-c cluster_machines'' | configuration of new cluster contains \\ ''-Dmapred.job.tracker=cluster_master'' | | | ''-c cluster_machines'' | The configuration of new cluster contains \\ ''-Dmapred.job.tracker=cluster_master'' | |
| ''-r number_of_reducers'' | ''-Dmapred.reduce.tasks=number_of_reducers'' | | | ''-r number_of_reducers'' | ''-Dmapred.reduce.tasks=number_of_reducers'' | |
| |
===== Brief list of Hadoop options ===== | ---- |
^ Hadoop option ^ Default value ^ Description ^ | |
| ''mapred.job.tracker'' | ? | Cluster master | | |
| ''mapred.reduce.tasks'' | 1 | Number of reducers | | |
| ''mapred.min.split.size'' | 1 | Minimum size of file split in bytes | | |
| ''mapred.max.split.size'' | 2%%^%%63-1 | Minimum size of file split in bytes | | |
| ''mapred.map.tasks.speculative.execution'' | true | If true, then multiple instances of some map tasks may be executed in parallel | | |
| ''mapred.reduce.tasks.speculative.execution'' | true | If true, then multiple instances of some reduce tasks may be executed in parallel | | |
| ''mapred.compress.map.output'' | false | Should the outputs of the maps be compressed before being sent across the network. Uses SequenceFile compression | | |
| |
A more complete list (but not exhaustive) can be found [[http://hadoop.apache.org/common/docs/r1.0.0/mapred-default.html|here]]. | |
| |
| <html> |
| <table style="width:100%"> |
| <tr> |
| <td style="text-align:left; width: 33%; "></html>[[step-8|Step 8]]: Multiple mappers, reducers and partitioning.<html></td> |
| <td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> |
| <td style="text-align:right; width: 33%; "></html>[[step-10|Step 10]]: Combiners.<html></td> |
| </tr> |
| </table> |
| </html> |