Table of Contents
MapReduce Tutorial : Running multiple Hadoop jobs in one source file
The Java API offers possibility to submit multiple Hadoop job in one source file. A job can be submitted either using
- job.waitForCompletion – the job is submitted and the method waits for it to finish (successfully or not).
- job.submit – the job is submitted and the method immediately returns. In this case, the state of the submitted job can be accessed using job.isComplete and job.isSuccessful
Exercise 1
Improve the sorting exercise to handle nonuniform keys distribution. As in the Perl solution, run two Hadoop jobs (using one Java source file) – first samples the input and creates separator, second does the real sorting.
Exercise 2
Implement the K-means clustering exercise in Java. Instead of an controlling script, use the Java class itself to execute the Hadoop job as many times as necessary.
Step 26: Compression and job configuration. | Overview | Step 28: Custom data types. |