[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
courses:mapreduce-tutorial:step-9 [2012/01/25 15:46]
straka vytvořeno
courses:mapreduce-tutorial:step-9 [2012/01/31 09:42]
straka Change Perl commandline syntax.
Line 1: Line 1:
-====== MapReduce Tutorial :  ======+====== MapReduce Tutorial : Hadoop properties ====== 
 + 
 +We have controlled the Hadoop jobs using the Perl API so far, which is quite limited. 
 + 
 +The Hadoop itself uses many configuration options. The options can be set on command line using the ''-Dname=value'' syntax: 
 +  perl script.pl [-jt cluster_master | -c cluster_size [-w sec_to_wait]] [-r number_of_reducers] [-Dname=value -Dname=value ...] input output_path 
 +Mind that the order of options matters -- the ''-jt'', ''-c'', ''-w'' and ''-r'' must precede Hadoop options to be recognized. 
 + 
 +Every Hadoop option has a read-only default. These are overridden by cluster specific options. Lastly, all of these are overridden by job specific options given on the command line (or set using the Java API). 
 + 
 +===== A brief list of Hadoop options ===== 
 +^ Hadoop option ^ Default value ^ Description ^ 
 +| ''mapred.job.tracker'' | ? | Cluster master. | 
 +| ''mapred.reduce.tasks'' | 1 | Number of reducers. | 
 +| ''mapred.min.split.size'' | 1 | Minimum size of file split in bytes. | 
 +| ''mapred.max.split.size'' | 2%%^%%63-1 | Minimum size of file split in bytes. | 
 +| ''mapred.map.tasks.speculative.execution'' | true | If true, then multiple instances of some map tasks may be executed in parallel. | 
 +| ''mapred.reduce.tasks.speculative.execution'' | true | If true, then multiple instances of some reduce tasks may be executed in parallel. | 
 +| ''mapred.compress.map.output'' | false | Should the outputs of the maps be compressed before being sent across the network. Uses SequenceFile compression. | 
 + 
 +A more complete list (but not exhaustive) can be found [[http://hadoop.apache.org/common/docs/r1.0.0/mapred-default.html|here]]. 
 + 
 +===== Mapping of Perl options to Hadoop ===== 
 +^ Perl options ^ Hadoop options ^ 
 +| no options \\ (running locally) | ''-Dmapred.job.tracker=local'' \\ ''-Dmapred.local.dir=hadoop-localrunner-tmp'' \\ ''-Dhadoop.tmp.dir=hadoop-localrunner-tmp''
 +| ''-jt cluster_master'' | ''-Dmapred.job.tracker=cluster_master''
 +| ''-c cluster_machines'' | The configuration of new cluster contains \\ ''-Dmapred.job.tracker=cluster_master''
 +| ''-r number_of_reducers'' | ''-Dmapred.reduce.tasks=number_of_reducers''
 + 
 +---- 
 + 
 +<html> 
 +<table style="width:100%"> 
 +<tr> 
 +<td style="text-align:left; width: 33%; "></html>[[step-8|Step 8]]: Multiple mappers, reducers and partitioning.<html></td> 
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> 
 +<td style="text-align:right; width: 33%; "></html>[[step-10|Step 10]]: Combiners.<html></td> 
 +</tr> 
 +</table> 
 +</html>

[ Back to the navigation ] [ Back to the content ]