[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki

[ Back to the navigation ]

This is an old revision of the document!

MapReduce Tutorial : Hadoop properties

We have controlled the Hadoop jobs using the Perl API so far, which is quite limited.

The Hadoop itself uses many configuration options. Every option has a (dot-separated) name and a value and can be set on the command line using -Dname=value syntax:

perl script.pl run [-jt cluster_master | -c cluster_size [-w sec_to_wait]] [-r number_of_reducers] [Hadoop options] input_path output_path

Mind that the order of options matters – the -jt, -c, -w and -r must precede Hadoop options to be recognized.

Every Hadoop option has a read-only default. These are overridden by cluster specific options. Lastly, all of these are overriden by job specific options given on the command line (or set using the Java API).

Mapping of Perl options to Hadoop

Perl options Hadoop options
no options
(running locally)
-jt cluster_master -Dmapred.job.tracker=cluster_master
-c cluster_machines configuration of new cluster contains
-r number_of_reducers -Dmapred.reduce.tasks=number_of_reducers

[ Back to the navigation ] [ Back to the content ]