MapReduce Tutorial : Hadoop properties

We have controlled the Hadoop jobs using the Perl API so far, which is quite limited.

The Hadoop itself uses many configuration options. Every option has a (dot-separated) name and a value and can be set on the command line using -Dname=value syntax:

perl script.pl run [-jt cluster_master | -c cluster_size [-w sec_to_wait]] [-r number_of_reducers] [Hadoop options] input_path output_path

Mind that the order of options matters – the -jt, -c, -w and -r must precede Hadoop options to be recognized.

Every Hadoop option has a read-only default. These are overridden by cluster specific options. Lastly, all of these are overriden by job specific options given on the command line (or set using the Java API).

A brief list of Hadoop options

Hadoop option	Default value	Description
`mapred.job.tracker`	?	Cluster master
`mapred.reduce.tasks`	1	Number of reducers
`mapred.min.split.size`	1	Minimum size of file split in bytes
`mapred.max.split.size`	2^63-1	Minimum size of file split in bytes
`mapred.map.tasks.speculative.execution`	true	If true, then multiple instances of some map tasks may be executed in parallel
`mapred.reduce.tasks.speculative.execution`	true	If true, then multiple instances of some reduce tasks may be executed in parallel
`mapred.compress.map.output`	false	Should the outputs of the maps be compressed before being sent across the network. Uses SequenceFile compression

A more complete list (but not exhaustive) can be found here.

Mapping of Perl options to Hadoop

Perl options	Hadoop options
no options (running locally)	`-Dmapred.job.tracker=local` `-Dmapred.local.dir=hadoop-localrunner-tmp` `-Dhadoop.tmp.dir=hadoop-localrunner-tmp`
`-jt cluster_master`	`-Dmapred.job.tracker=cluster_master`
`-c cluster_machines`	configuration of new cluster contains `-Dmapred.job.tracker=cluster_master`
`-r number_of_reducers`	`-Dmapred.reduce.tasks=number_of_reducers`

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Table of Contents

MapReduce Tutorial : Hadoop properties

A brief list of Hadoop options

Mapping of Perl options to Hadoop