Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
courses:mapreduce-tutorial:step-24 [2012/01/27 21:41] straka |
courses:mapreduce-tutorial:step-24 [2012/01/27 22:01] straka |
||
---|---|---|---|
Line 3: | Line 3: | ||
We start by going through a simple Hadoop job with Mapper only. | We start by going through a simple Hadoop job with Mapper only. | ||
- | A mapper which processes (key, value) pairs of types (Kin, Vin) and produces (key, value) pairs of types (Kout, Vout) must be a subclass of [[http:// | + | A mapper which processes (key, value) pairs of types (Kin, Vin) and produces (key, value) pairs of types (Kout, Vout) must be a subclass of [[http:// |
+ | The mapper must define a '' | ||
+ | <code java> | ||
+ | public static class TheMapper extends Mapper< | ||
+ | public void setup(Context context) throws IOException, | ||
+ | |||
+ | public void map(Text key, Text value, Context context) throws IOException, | ||
+ | |||
+ | public void cleanup(Context context) throws IOException, | ||
+ | } | ||
+ | </ | ||
- | http:// | + | Outputting (key, value) pairs is performed using the [[http:// |
- | The Mapper outputs only keys starting with '' | + | Here is the source of the whole Hadoop job: |
<file java MapperOnlyHadoopJob.java> | <file java MapperOnlyHadoopJob.java> | ||
Line 44: | Line 54: | ||
} | } | ||
- | Job job = new Job(getConf(), | + | Job job = new Job(getConf(), |
+ | // Name of the job is the name of current class. | ||
- | job.setJarByClass(this.getClass()); | + | job.setJarByClass(this.getClass()); |
- | job.setMapperClass(TheMapper.class); | + | job.setMapperClass(TheMapper.class); |
- | job.setOutputKeyClass(Text.class); | + | job.setOutputKeyClass(Text.class); |
- | job.setOutputValueClass(Text.class); | + | job.setOutputValueClass(Text.class); |
- | job.setInputFormatClass(KeyValueTextInputFormat.class); | + | job.setInputFormatClass(KeyValueTextInputFormat.class); |
+ | // Output format is the default -- TextOutputFormat | ||
- | FileInputFormat.addInputPath(job, | + | FileInputFormat.addInputPath(job, |
- | FileOutputFormat.setOutputPath(job, | + | FileOutputFormat.setOutputPath(job, |
return job.waitForCompletion(true) ? 0 : 1; | return job.waitForCompletion(true) ? 0 : 1; | ||
Line 67: | Line 79: | ||
} | } | ||
</ | </ | ||
+ | |||
+ | Remarks: | ||
+ | * The filename //must// be the same as the name of the class -- this is enforced by Java compiler. | ||
+ | * In one class multiple jobs can be submitted, either in sequence or in parallel. | ||
+ | * A mismatch of types is usually detected by the compiler, but sometimes it is detected only at runtime. If that happens, an exception is raised and the program crashes. | ||
===== Running the job ===== | ===== Running the job ===== |