[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
courses:mapreduce-tutorial:step-9 [2012/01/25 16:18]
straka
courses:mapreduce-tutorial:step-9 [2012/01/31 09:42] (current)
straka Change Perl commandline syntax.
Line 3: Line 3:
 We have controlled the Hadoop jobs using the Perl API so far, which is quite limited. We have controlled the Hadoop jobs using the Perl API so far, which is quite limited.
  
-The Hadoop itself uses many configuration options. Every option has a (dot-separated) name and a value and can be set on the command line using ''-Dname=value'' syntax: +The Hadoop itself uses many configuration options. The options can be set on command line using the ''-Dname=value'' syntax: 
-  perl script.pl run [-jt cluster_master | -c cluster_size [-w sec_to_wait]] [-r number_of_reducers] [Hadoop optionsinput_path output_path+  perl script.pl [-jt cluster_master | -c cluster_size [-w sec_to_wait]] [-r number_of_reducers] [-Dname=value -Dname=value ...input output_path
 Mind that the order of options matters -- the ''-jt'', ''-c'', ''-w'' and ''-r'' must precede Hadoop options to be recognized. Mind that the order of options matters -- the ''-jt'', ''-c'', ''-w'' and ''-r'' must precede Hadoop options to be recognized.
  
-Every Hadoop option has a read-only default. These are overridden by cluster specific options. Lastly, all of these are overriden by job specific options given on the command line (or set using the Java API).+Every Hadoop option has a read-only default. These are overridden by cluster specific options. Lastly, all of these are overridden by job specific options given on the command line (or set using the Java API).
  
 ===== A brief list of Hadoop options ===== ===== A brief list of Hadoop options =====
 ^ Hadoop option ^ Default value ^ Description ^ ^ Hadoop option ^ Default value ^ Description ^
-| ''mapred.job.tracker'' | ? | Cluster master | +| ''mapred.job.tracker'' | ? | Cluster master
-| ''mapred.reduce.tasks'' | 1 | Number of reducers | +| ''mapred.reduce.tasks'' | 1 | Number of reducers
-| ''mapred.min.split.size'' | 1 | Minimum size of file split in bytes | +| ''mapred.min.split.size'' | 1 | Minimum size of file split in bytes
-| ''mapred.max.split.size'' | 2%%^%%63-1 | Minimum size of file split in bytes | +| ''mapred.max.split.size'' | 2%%^%%63-1 | Minimum size of file split in bytes
-| ''mapred.map.tasks.speculative.execution'' | true | If true, then multiple instances of some map tasks may be executed in parallel | +| ''mapred.map.tasks.speculative.execution'' | true | If true, then multiple instances of some map tasks may be executed in parallel
-| ''mapred.reduce.tasks.speculative.execution'' | true | If true, then multiple instances of some reduce tasks may be executed in parallel | +| ''mapred.reduce.tasks.speculative.execution'' | true | If true, then multiple instances of some reduce tasks may be executed in parallel
-| ''mapred.compress.map.output'' | false | Should the outputs of the maps be compressed before being sent across the network. Uses SequenceFile compression |+| ''mapred.compress.map.output'' | false | Should the outputs of the maps be compressed before being sent across the network. Uses SequenceFile compression|
  
 A more complete list (but not exhaustive) can be found [[http://hadoop.apache.org/common/docs/r1.0.0/mapred-default.html|here]]. A more complete list (but not exhaustive) can be found [[http://hadoop.apache.org/common/docs/r1.0.0/mapred-default.html|here]].
Line 25: Line 25:
 | no options \\ (running locally) | ''-Dmapred.job.tracker=local'' \\ ''-Dmapred.local.dir=hadoop-localrunner-tmp'' \\ ''-Dhadoop.tmp.dir=hadoop-localrunner-tmp'' | | no options \\ (running locally) | ''-Dmapred.job.tracker=local'' \\ ''-Dmapred.local.dir=hadoop-localrunner-tmp'' \\ ''-Dhadoop.tmp.dir=hadoop-localrunner-tmp'' |
 | ''-jt cluster_master'' | ''-Dmapred.job.tracker=cluster_master'' | | ''-jt cluster_master'' | ''-Dmapred.job.tracker=cluster_master'' |
-| ''-c cluster_machines'' | configuration of new cluster contains \\ ''-Dmapred.job.tracker=cluster_master'' |+| ''-c cluster_machines''The configuration of new cluster contains \\ ''-Dmapred.job.tracker=cluster_master'' |
 | ''-r number_of_reducers'' | ''-Dmapred.reduce.tasks=number_of_reducers'' | | ''-r number_of_reducers'' | ''-Dmapred.reduce.tasks=number_of_reducers'' |
  
 +----
 +
 +<html>
 +<table style="width:100%">
 +<tr>
 +<td style="text-align:left; width: 33%; "></html>[[step-8|Step 8]]: Multiple mappers, reducers and partitioning.<html></td>
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
 +<td style="text-align:right; width: 33%; "></html>[[step-10|Step 10]]: Combiners.<html></td>
 +</tr>
 +</table>
 +</html>

[ Back to the navigation ] [ Back to the content ]