MapReduce Tutorial : Making your job configurable
Sometimes it is desirable for a Hadoop job to be configurable without recompiling/rewriting the source. This can be achieved:
- Java: use Hadoop properties:
- when running the job, use
/net/projects/hadoop/bin/hadoop job.jar -Dname1=value1 -Dname2=value2 … input output
. - in the job, use
job.getConfiguration.get(“name”, default)
to get thevalue
asString
, or use one ofgetInt, getLong, getFloat, getRange, getFile, getStrings, …
.
- Perl: use environment variables:
- when constructing Hadoop::Runner, use
copy_environment => ['VARIABLE1', 'VARIABLE2', … ]
. - run the job using
VARIABLE1=value1 VARIABLE2=value2 … perl script.pl input output
. - in the job, use
$ENV{VARIABLE1}
to access the value.