[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
courses:mapreduce-tutorial:running-jobs [2012/02/05 20:00]
straka
courses:mapreduce-tutorial:running-jobs [2012/02/05 21:24]
straka
Line 1: Line 1:
 ====== MapReduce Tutorial : Running jobs ====== ====== MapReduce Tutorial : Running jobs ======
 +
 +The input of a Hadoop job is either a file, or a directory. In latter case all files in the directory are processed.
 +
 +The output of a Hadoop job must be a directory, which does not exist.
 +
 +===== Running jobs =====
 +
 +| ^ Command ^
 +^ Run Perl script ''script.pl'' | ''perl script.pl'' //options// |
 +^ Run Java job ''job.jar'' | ''/net/projects/hadoop/bin/hadoop job.jar'' //options// |
 +
 +The options are the same for Perl and java:
 +
 +| ^ Options ^
 +^ Run locally | ''input output'' |
 +^ Run using specified jobtracker | ''-jt jobtracker:port input output'' |
 +^ Run job in dedicated cluster | ''-c number_of_machines input output'' |
 +^ Run job in dedicated cluster and after it finishes, \\ wait for //W// seconds before stopping the cluster | ''-c number_of_machines -w W_seconds input output'' |
 +^ Run using //R// reducers \\ (//R//>1 not working when running locally)| ''-r R input output'' |
 +^ Run using //M// mappers | ''`/net/projects/hadoop/bin/compute-splitsize input M` input output'' |
 +
 +===== Running multiple jobs =====
 +There are several ways of running multiple jobs:
 +  * Java only: Create multiple ''Job'' instances and call ''submit'' or ''waitForCompletion'' multiple times
 +  * Create a cluster using ''/net/projects/hadoop/bin/hadoop-cluster'', parse the jobtracker:port using ''head -1'' and run the jobs using ''-jt jobtracker:port''
 +  * Create a shell script running multiple jobs using ''-jt HADOOP_JOBTRACKER''. Then run it using ''//net/projects/hadoop/bin/hadoop-cluster -c machines script.sh''.

[ Back to the navigation ] [ Back to the content ]