Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:running-jobs [2012/02/05 20:00]
straka
+++ courses:mapreduce-tutorial:running-jobs [2012/02/05 21:14]
straka
@@ Line 1: / Line 1: @@
 ====== MapReduce Tutorial : Running jobs ======
+The input of a Hadoop job is either a file, or a directory. In latter case all files in the directory are processed.
+The output of a Hadoop job must be a directory, which does not exist.
+===== Run Perl jobs =====
+Choosing mode of operation:
+| ^ Command ^
+^ Run locally | ''perl script.pl input output'' |
+^ Run using specified jobtracker | ''perl script.pl -jt jobtracker:port input output'' |
+^ Run job in dedicated cluster | ''perl script.pl -c number_of_machines input output'' |
+^ Run job in dedicated cluster and after it finishes, \\ wait for //W// seconds before stopping the cluster | ''perl script.pl -c number_of_machines -w W_seconds input output'' |
+Specifying number of mappers and reducers:
+| ^ Command ^
+^ Run using //R// reducers \\ (//R//>1 not working when running locally)| ''perl -r R script.pl input output'' |
+^ Run using //M// mappers | ''perl script.pl `/net/projects/hadoop/bin/compute-splitsize input M` input output'' |

Institute of Formal and Applied Linguistics Wiki