[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

This is an old revision of the document!


Table of Contents

MapReduce Tutorial : Running jobs

The input of a Hadoop job is either a file, or a directory. In latter case all files in the directory are processed.

The output of a Hadoop job must be a directory, which does not exist.

Run Perl jobs

Choosing mode of operation:

Command
Run locally perl script.pl input output
Run using specified jobtracker perl script.pl -jt jobtracker:port input output
Run job in dedicated cluster perl script.pl -c number_of_machines input output
Run job in dedicated cluster and after it finishes,
wait for W seconds before stopping the cluster
perl script.pl -c number_of_machines -w W_seconds input output

Specifying number of mappers and reducers:

Command
Run using R reducers
(R>1 not working when running locally)
perl script.pl -r R script.pl input output
Run using M mappers perl script.pl `/net/projects/hadoop/bin/compute-splitsize input M` input output

Run Java jobs

Choosing mode of operation:

Command
Run locally /net/projects/hadoop/bin/hadoop job.jar input output
Run using specified jobtracker /net/projects/hadoop/bin/hadoop job.jar -jt jobtracker:port input output
Run job in dedicated cluster /net/projects/hadoop/bin/hadoop job.jar -c number_of_machines input output
Run job in dedicated cluster and after it finishes,
wait for W seconds before stopping the cluster
/net/projects/hadoop/bin/hadoop job.jar -c number_of_machines -w W_seconds input output

Specifying number of mappers and reducers:

Command
Run using R reducers
(R>1 not working when running locally)
perl -r R script.pl input output
Run using M mappers /net/projects/hadoop/bin/hadoop job.jar `/net/projects/hadoop/bin/compute-splitsize input M` input output

[ Back to the navigation ] [ Back to the content ]