MapReduce Tutorial : Making your job configurable
Sometimes it is desirable for a Hadoop job to be configurable without recompiling/rewriting the source. This can be achieved:
- Java: use Hadoop properties:
- when running the job, use
/net/projects/hadoop/bin/hadoop job.jar -Dname1=value1 -Dname2=value2 … input output. - in the job, use
job.getConfiguration.get(“name”, default)to get thevalueasString, or use one ofgetInt, getLong, getFloat, getRange, getFile, getStrings, ….
- Perl: use environment variables:
- when constructing Hadoop::Runner, use
copy_environment => ['VARIABLE1', 'VARIABLE2', … ]. - run the job using
VARIABLE1=value1 VARIABLE2=value2 … perl script.pl input output. - in the job, use
$ENV{VARIABLE1}to access the value.
