Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
courses:mapreduce-tutorial:step-24 [2012/01/27 21:02] straka |
courses:mapreduce-tutorial:step-24 [2012/01/27 22:20] straka |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== MapReduce Tutorial : Mappers, running Java Hadoop jobs ====== | ====== MapReduce Tutorial : Mappers, running Java Hadoop jobs ====== | ||
- | We start by exploring | + | We start by going through |
+ | |||
+ | A mapper which processes (key, value) pairs of types (Kin, Vin) and produces (key, value) pairs of types (Kout, Vout) must be a subclass of [[http:// | ||
+ | |||
+ | The mapper must define a '' | ||
+ | <code java> | ||
+ | public static class TheMapper extends | ||
+ | public void setup(Context context) throws IOException, | ||
+ | |||
+ | public void map(Text key, Text value, Context context) throws IOException, | ||
+ | |||
+ | public void cleanup(Context context) throws IOException, | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Outputting (key, value) pairs is performed using the [[http:// | ||
+ | |||
+ | Here is the source of the whole Hadoop job: | ||
<file java MapperOnlyHadoopJob.java> | <file java MapperOnlyHadoopJob.java> | ||
import java.io.IOException; | import java.io.IOException; | ||
Line 36: | Line 54: | ||
} | } | ||
- | Job job = new Job(getConf(), | + | Job job = new Job(getConf(), |
+ | // Name of the job is the name of current class. | ||
- | job.setJarByClass(this.getClass()); | + | job.setJarByClass(this.getClass()); |
- | job.setMapperClass(TheMapper.class); | + | job.setMapperClass(TheMapper.class); |
- | job.setOutputKeyClass(Text.class); | + | job.setOutputKeyClass(Text.class); |
- | job.setOutputValueClass(Text.class); | + | job.setOutputValueClass(Text.class); |
- | job.setInputFormatClass(KeyValueTextInputFormat.class); | + | job.setInputFormatClass(KeyValueTextInputFormat.class); |
+ | // Output format is the default -- TextOutputFormat | ||
- | FileInputFormat.addInputPath(job, | + | FileInputFormat.addInputPath(job, |
- | FileOutputFormat.setOutputPath(job, | + | FileOutputFormat.setOutputPath(job, |
return job.waitForCompletion(true) ? 0 : 1; | return job.waitForCompletion(true) ? 0 : 1; | ||
Line 59: | Line 79: | ||
} | } | ||
</ | </ | ||
+ | |||
+ | Remarks: | ||
+ | * The filename //must// be the same as the name of the class -- this is enforced by Java compiler. | ||
+ | * In one class multiple jobs can be submitted, either in sequence or in parallel. | ||
+ | * A mismatch of types is usually detected by the compiler, but sometimes it is detected only at runtime. If that happens, an exception is raised and the program crashes. For example, default key output class it '' | ||
===== Running the job ===== | ===== Running the job ===== | ||
- | Download | + | The official way of running Hadoop jobs is to use the ''/ |
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | |||
+ | ===== Exercise ===== | ||
+ | Download the '' | ||
+ | / | ||
- | The // | + | Mind the '' |
- | * '' | + | * When using '' |
+ | * When not specifying | ||