[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-24 [2012/01/27 21:41]
straka
courses:mapreduce-tutorial:step-24 [2012/01/27 21:52]
straka
Line 3: Line 3:
 We start by going through a simple Hadoop job with Mapper only. We start by going through a simple Hadoop job with Mapper only.
  
-A mapper which processes (key, value) pairs of types (Kin, Vin) and produces (key, value) pairs of types (Kout, Vout) must be a subclass of [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Mapper.html|Mapper<Kin, Vin, Kout, Vout>]]. In our case, ''TheMapper'' is subclass of ''Mapper<Text, Text, Text, Text>''.+A mapper which processes (key, value) pairs of types (Kin, Vin) and produces (key, value) pairs of types (Kout, Vout) must be a subclass of [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Mapper.html|Mapper<Kin, Vin, Kout, Vout>]]. In our case, the mapper is subclass of ''Mapper<Text, Text, Text, Text>''.
  
 +The mapper must define a ''map'' method and may provide ''setup'' and ''context'' method:
 +<code java>
 +  public static class TheMapper extends Mapper<Text, Text, Text, Text>{
 +    public void setup(Context context) throws IOException, InterruptedException {}
 +
 +    public void map(Text key, Text value, Context context) throws IOException, InterruptedException {}
 +
 +    public void cleanup(Context context) throws IOException, InterruptedException {}
 +  }
 +</code>
  
-http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Mapper.html+Outputting (key, value) pairs is performed using the [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/MapContext.html|MapContext<Kin, Vin, Kout, Vout>]] object (the ''Context'' is an abbreviation for this type), with the method ''context.write(Kout key, Vout value)''.
  
- The Mapper outputs only keys starting with ''A''.+Here is the source of the whole Hadoop job:
  
 <file java MapperOnlyHadoopJob.java> <file java MapperOnlyHadoopJob.java>
Line 44: Line 54:
     }     }
  
-    Job job = new Job(getConf(), this.getClass().getName());+    Job job = new Job(getConf(), this.getClass().getName());    // Create class representing Hadoop job.
  
-    job.setJarByClass(this.getClass()); +    job.setJarByClass(this.getClass());                         // Use jar containing current class. 
-    job.setMapperClass(TheMapper.class); +    job.setMapperClass(TheMapper.class);                        // The mapper of the job. 
-    job.setOutputKeyClass(Text.class); +    job.setOutputKeyClass(Text.class);                          // Type of the output keys. 
-    job.setOutputValueClass(Text.class);+    job.setOutputValueClass(Text.class);                        // Type of the output values.
  
-    job.setInputFormatClass(KeyValueTextInputFormat.class);+    job.setInputFormatClass(KeyValueTextInputFormat.class);     // Input format. 
 +                                                                // Output format is the default -- TextOutputFormat
  
-    FileInputFormat.addInputPath(job, new Path(args[0])); +    FileInputFormat.addInputPath(job, new Path(args[0]));       // Input path is on command line. 
-    FileOutputFormat.setOutputPath(job, new Path(args[1]));+    FileOutputFormat.setOutputPath(job, new Path(args[1]));     // Output path is on command line too.
  
     return job.waitForCompletion(true) ? 0 : 1;     return job.waitForCompletion(true) ? 0 : 1;

[ Back to the navigation ] [ Back to the content ]