Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-24 [2012/01/27 21:41]
straka
+++ courses:mapreduce-tutorial:step-24 [2012/01/27 21:52]
straka
@@ Line 3: / Line 3: @@
 We start by going through a simple Hadoop job with Mapper only.
-A mapper which processes (key, value) pairs of types (Kin, Vin) and produces (key, value) pairs of types (Kout, Vout) must be a subclass of [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Mapper.html|Mapper<Kin, Vin, Kout, Vout>]]. In our case, ''TheMapper'' is subclass of ''Mapper<Text, Text, Text, Text>''.
+A mapper which processes (key, value) pairs of types (Kin, Vin) and produces (key, value) pairs of types (Kout, Vout) must be a subclass of [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Mapper.html|Mapper<Kin, Vin, Kout, Vout>]]. In our case, the mapper is subclass of ''Mapper<Text, Text, Text, Text>''.
+The mapper must define a ''map'' method and may provide ''setup'' and ''context'' method:
+<code java>
+  public static class TheMapper extends Mapper<Text, Text, Text, Text>{
+    public void setup(Context context) throws IOException, InterruptedException {}
+    public void map(Text key, Text value, Context context) throws IOException, InterruptedException {}
+    public void cleanup(Context context) throws IOException, InterruptedException {}
+  }
+</code>
-http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Mapper.html
+Outputting (key, value) pairs is performed using the [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/MapContext.html|MapContext<Kin, Vin, Kout, Vout>]] object (the ''Context'' is an abbreviation for this type), with the method ''context.write(Kout key, Vout value)''.
- The Mapper outputs only keys starting with ''A''.
+Here is the source of the whole Hadoop job:
 <file java MapperOnlyHadoopJob.java>
@@ Line 44: / Line 54: @@
     }
-    Job job = new Job(getConf(), this.getClass().getName());
+    Job job = new Job(getConf(), this.getClass().getName());    // Create class representing Hadoop job.
-    job.setJarByClass(this.getClass());
+    job.setJarByClass(this.getClass());                         // Use jar containing current class.
-    job.setMapperClass(TheMapper.class);
+    job.setMapperClass(TheMapper.class);                        // The mapper of the job.
-    job.setOutputKeyClass(Text.class);
+    job.setOutputKeyClass(Text.class);                          // Type of the output keys.
-    job.setOutputValueClass(Text.class);
+    job.setOutputValueClass(Text.class);                        // Type of the output values.
-    job.setInputFormatClass(KeyValueTextInputFormat.class);
+    job.setInputFormatClass(KeyValueTextInputFormat.class);     // Input format.
+                                                                // Output format is the default -- TextOutputFormat
-    FileInputFormat.addInputPath(job, new Path(args[0]));
+    FileInputFormat.addInputPath(job, new Path(args[0]));       // Input path is on command line.
-    FileOutputFormat.setOutputPath(job, new Path(args[1]));
+    FileOutputFormat.setOutputPath(job, new Path(args[1]));     // Output path is on command line too.
     return job.waitForCompletion(true) ? 0 : 1;

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences