Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-24 [2012/01/27 20:56]
straka
+++ courses:mapreduce-tutorial:step-24 [2012/01/27 21:28]
straka
@@ Line 18: / Line 18: @@
     public void setup(Context context) {
     }
     public void map(Text key, Text value, Context context) throws IOException, InterruptedException {
       if (key.getLength() > 0 && Character.toUpperCase(key.charAt(0)) == 'A') {
@@ Line 26: / Line 26: @@
     public void cleanup(Context context) {
     }
   }
   // Job configuration
   public int run(String[] args) throws Exception {
@@ Line 35: / Line 35: @@
       return 1;
     }
     Job job = new Job(getConf(), this.getClass().getName());
     job.setJarByClass(this.getClass());
     job.setMapperClass(TheMapper.class);
     job.setOutputKeyClass(Text.class);
     job.setOutputValueClass(Text.class);
     job.setInputFormatClass(KeyValueTextInputFormat.class);
     FileInputFormat.addInputPath(job, new Path(args[0]));
     FileOutputFormat.setOutputPath(job, new Path(args[1]));
     return job.waitForCompletion(true) ? 0 : 1;
   }
@@ Line 57: / Line 57: @@
     System.exit(res);
   }
 }
 </file>
+===== Running the job =====
+Download the source and compile it.
+The official way of running Hadoop jobs is to use the ''/SGE/HADOOP/active/bin/hadoop'' script. Jobs submitted through this script can be configured using Hadoop properties only. Therefore a wrapper script is provided, with similar options as the Perl API runner:
+  * ''net/projects/hadoop/bin/hadoop [-r number_of_reducers] job.jar [generic Hadoop properties] input_path output_path'' -- executes the given job locally in a single thread. It is useful for debugging.
+  * ''net/projects/hadoop/bin/hadoop -jt cluster_master [-r number_of_reducers] job.jar [generic Hadoop properties] input_path output_path'' -- submits the job to given ''cluster_master''.
+  * ''net/projects/hadoop/bin/hadoop -c number_of_machines [-w secs_to_wait_after_job_finishes] [-r number_of_reducers] job.jar [generic Hadoop properties] input_path output_path'' -- creates a new cluster with specified number of machines, which executes given job, and then waits for specified number of seconds before it stops.

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences