MapReduce Tutorial : Making your job configurable

Sometimes it is desirable for a Hadoop job to be configurable without recompiling/rewriting the source. This can be achieved:

Java: use Hadoop properties:
1. when running the job, use /net/projects/hadoop/bin/hadoop job.jar -Dname1=value1 -Dname2=value2 … input output.
2. in the job, use job.getConfiguration.get(“name”, default) to get the value as String, or use one of getInt, getLong, getFloat, getRange, getFile, getStrings, ….
Perl: use environment variables:
1. when constructing Hadoop::Runner, use copy_environment => ['VARIABLE1', 'VARIABLE2', … ].
2. run the job using VARIABLE1=value1 VARIABLE2=value2 … perl script.pl input output.
3. in the job, use $ENV{VARIABLE1} to access the value.

Institute of Formal and Applied Linguistics Wiki