[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
courses:mapreduce-tutorial:running-jobs [2012/02/05 21:16]
straka
courses:mapreduce-tutorial:running-jobs [2012/02/05 21:24]
straka
Line 5: Line 5:
 The output of a Hadoop job must be a directory, which does not exist. The output of a Hadoop job must be a directory, which does not exist.
  
-===== Run Perl jobs ===== +===== Running jobs =====
-Choosing mode of operation: +
-| ^ Command ^ +
-^ Run locally | ''perl script.pl input output''+
-^ Run using specified jobtracker | ''perl script.pl -jt jobtracker:port input output''+
-^ Run job in dedicated cluster | ''perl script.pl -c number_of_machines input output''+
-^ Run job in dedicated cluster and after it finishes, \\ wait for //W// seconds before stopping the cluster | ''perl script.pl -c number_of_machines -w W_seconds input output'' |+
  
-Specifying number of mappers and reducers: 
 | ^ Command ^ | ^ Command ^
-^ Run using //R// reducers \\ (//R//>1 not working when running locally)| ''perl script.pl -r R script.pl input output'' +^ Run Perl script ''script.pl'' | ''perl script.pl'' //options// | 
-^ Run using //M// mappers | ''perl script.pl `/net/projects/hadoop/bin/compute-splitsize input M` input output'' |+^ Run Java job ''job.jar'' | ''/net/projects/hadoop/bin/hadoop job.jar'' //options// |
  
-===== Run Java jobs ===== +The options are the same for Perl and java:
-Choosing mode of operation: +
-| ^ Command ^ +
-^ Run locally | ''/net/projects/hadoop/bin/hadoop job.jar input output''+
-^ Run using specified jobtracker | ''/net/projects/hadoop/bin/hadoop job.jar -jt jobtracker:port input output''+
-^ Run job in dedicated cluster | ''/net/projects/hadoop/bin/hadoop job.jar -c number_of_machines input output''+
-^ Run job in dedicated cluster and after it finishes, \\ wait for //W// seconds before stopping the cluster | ''/net/projects/hadoop/bin/hadoop job.jar -c number_of_machines -w W_seconds input output'' |+
  
-Specifying number of mappers and reducers: +| ^ Options ^ 
-| ^ Command +^ Run locally ''input output''
-^ Run using //R// reducers \\ (//R//>1 not working when running locally)| ''/net/projects/hadoop/bin/hadoop job.jar -r R script.pl input output''+Run using specified jobtracker | ''-jt jobtracker:port input output''
-^ Run using //M// mappers | ''/net/projects/hadoop/bin/hadoop job.jar `/net/projects/hadoop/bin/compute-splitsize input M` input output'' |+Run job in dedicated cluster | ''-c number_of_machines input output''
 +^ Run job in dedicated cluster and after it finishes, \\ wait for //W// seconds before stopping the cluster | ''-c number_of_machines -w W_seconds input output'' | 
 +^ Run using //R// reducers \\ (//R//>1 not working when running locally)| ''-r R input output''
 +^ Run using //M// mappers | ''`/net/projects/hadoop/bin/compute-splitsize input M` input output'' |
  
 +===== Running multiple jobs =====
 +There are several ways of running multiple jobs:
 +  * Java only: Create multiple ''Job'' instances and call ''submit'' or ''waitForCompletion'' multiple times
 +  * Create a cluster using ''/net/projects/hadoop/bin/hadoop-cluster'', parse the jobtracker:port using ''head -1'' and run the jobs using ''-jt jobtracker:port''
 +  * Create a shell script running multiple jobs using ''-jt HADOOP_JOBTRACKER''. Then run it using ''//net/projects/hadoop/bin/hadoop-cluster -c machines script.sh''.

[ Back to the navigation ] [ Back to the content ]