This is an old revision of the document!
Table of Contents
MapReduce Tutorial : Running jobs
The input of a Hadoop job is either a file, or a directory. In latter case all files in the directory are processed.
The output of a Hadoop job must be a directory, which does not exist.
Run Perl jobs
Choosing mode of operation:
Command | |
---|---|
Run locally | perl script.pl input output |
Run using specified jobtracker | perl script.pl -jt jobtracker:port input output |
Run job in dedicated cluster | perl script.pl -c number_of_machines input output |
Run job in dedicated cluster and after it finishes, wait for W seconds before stopping the cluster | perl script.pl -c number_of_machines -w W_seconds input output |
Specifying number of mappers and reducers:
Command | |
---|---|
Run using R reducers (R>1 not working when running locally) | perl script.pl -r R script.pl input output |
Run using M mappers | perl script.pl `/net/projects/hadoop/bin/compute-splitsize input M` input output |
Run Java jobs
Choosing mode of operation:
Command | |
---|---|
Run locally | /net/projects/hadoop/bin/hadoop job.jar input output |
Run using specified jobtracker | /net/projects/hadoop/bin/hadoop job.jar -jt jobtracker:port input output |
Run job in dedicated cluster | /net/projects/hadoop/bin/hadoop job.jar -c number_of_machines input output |
Run job in dedicated cluster and after it finishes, wait for W seconds before stopping the cluster | /net/projects/hadoop/bin/hadoop job.jar -c number_of_machines -w W_seconds input output |
Specifying number of mappers and reducers:
Command | |
---|---|
Run using R reducers (R>1 not working when running locally) | perl -r R script.pl input output |
Run using M mappers | /net/projects/hadoop/bin/hadoop job.jar `/net/projects/hadoop/bin/compute-splitsize input M` input output |