Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-9 [2012/01/25 15:56]
straka
+++ courses:mapreduce-tutorial:step-9 [2012/01/31 09:42] (current)
straka Change Perl commandline syntax.
@@ Line 3: / Line 3: @@
 We have controlled the Hadoop jobs using the Perl API so far, which is quite limited.
-The Hadoop itself uses many configuration options. Every option has a (dot-separated) name and a value and can be set on the command line using ''-Dname=value'' syntax:
+The Hadoop itself uses many configuration options. The options can be set on command line using the ''-Dname=value'' syntax:
-  perl script.pl run [-jt cluster_master | -c cluster_size [-w sec_to_wait]] [-r number_of_reducers] [Hadoop options] input_path output_path
+  perl script.pl [-jt cluster_master | -c cluster_size [-w sec_to_wait]] [-r number_of_reducers] [-Dname=value -Dname=value ...] input output_path
-Mind that the order of options matters -- the ''-jt'', ''-c'', ''-w'' and ''-r'' must precede Hadoop options.
+Mind that the order of options matters -- the ''-jt'', ''-c'', ''-w'' and ''-r'' must precede Hadoop options to be recognized.
+Every Hadoop option has a read-only default. These are overridden by cluster specific options. Lastly, all of these are overridden by job specific options given on the command line (or set using the Java API).
+===== A brief list of Hadoop options =====
+^ Hadoop option ^ Default value ^ Description ^
+| ''mapred.job.tracker'' | ? | Cluster master. |
+| ''mapred.reduce.tasks'' | 1 | Number of reducers. |
+| ''mapred.min.split.size'' | 1 | Minimum size of file split in bytes. |
+| ''mapred.max.split.size'' | 2%%^%%63-1 | Minimum size of file split in bytes. |
+| ''mapred.map.tasks.speculative.execution'' | true | If true, then multiple instances of some map tasks may be executed in parallel. |
+| ''mapred.reduce.tasks.speculative.execution'' | true | If true, then multiple instances of some reduce tasks may be executed in parallel. |
+| ''mapred.compress.map.output'' | false | Should the outputs of the maps be compressed before being sent across the network. Uses SequenceFile compression. |
+A more complete list (but not exhaustive) can be found [[http://hadoop.apache.org/common/docs/r1.0.0/mapred-default.html|here]].
+===== Mapping of Perl options to Hadoop =====
+^ Perl options ^ Hadoop options ^
+| no options \\ (running locally) | ''-Dmapred.job.tracker=local'' \\ ''-Dmapred.local.dir=hadoop-localrunner-tmp'' \\ ''-Dhadoop.tmp.dir=hadoop-localrunner-tmp'' |
+| ''-jt cluster_master'' | ''-Dmapred.job.tracker=cluster_master'' |
+| ''-c cluster_machines'' | The configuration of new cluster contains \\ ''-Dmapred.job.tracker=cluster_master'' |
+| ''-r number_of_reducers'' | ''-Dmapred.reduce.tasks=number_of_reducers'' |
+----
+<html>
+<table style="width:100%">
+<tr>
+<td style="text-align:left; width: 33%; "></html>[[step-8|Step 8]]: Multiple mappers, reducers and partitioning.<html></td>
+<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
+<td style="text-align:right; width: 33%; "></html>[[step-10|Step 10]]: Combiners.<html></td>
+</tr>
+</table>
+</html>

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences