Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
courses:mapreduce-tutorial:step-28 [2012/01/31 12:40] straka |
courses:mapreduce-tutorial:step-28 [2012/01/31 13:12] straka |
====== MapReduce Tutorial : Running multiple Hadoop jobs in one class ====== | ====== MapReduce Tutorial : Running multiple Hadoop jobs in source file ====== |
| |
The Java API offers possibility to submit multiple Hadoop job in one class. A job can be submitted either using | The Java API offers possibility to submit multiple Hadoop job in one source file. A job can be submitted either using |
* [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#waitForCompletion(boolean)|job.waitForCompletion]] -- the job is submitted and the method waits for it to finish (successfully or not). | * [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#waitForCompletion(boolean)|job.waitForCompletion]] -- the job is submitted and the method waits for it to finish (successfully or not). |
* [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#submit()|job.submit]] -- the job is submitted and the method immediately returns. In this case, the state of the submitted job can be accessed using [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isComplete()|job.isComplete]] and [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isSuccessful()|job.isSuccessful]] | * [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#submit()|job.submit]] -- the job is submitted and the method immediately returns. In this case, the state of the submitted job can be accessed using [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isComplete()|job.isComplete]] and [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isSuccessful()|job.isSuccessful]] |
| |
===== Exercise 1 ===== | ===== Exercise 1 ===== |
| Improve the [[.:step-25#exercise|sorting exercise]] to handle [[.:step-13#nonuniform-data|nonuniform keys distribution]]. As in the [[.:step-13#nonuniform-data|Perl solution]], run two Hadoop jobs (using one Java source file) -- first samples the input and creates separator, second does the real sorting. |
| |
Improve the last [[.:step-27#exercise|inverted index creation exercise]], such that | ===== Exercise 2 ===== |
| |
| Improve the [[.:step-27#exercise|inverted index creation exercise]], such that |
- in the first job, create a list of unique document names. Number the documents using the order in this list. | - in the first job, create a list of unique document names. Number the documents using the order in this list. |
- in the second job, create for each word sorted list of ''DocWithOccurences<IntWritable>'', where the document is identified by its number (contrary to the previous exercise, where ''Text'' was used to identify the document). | - in the second job, create for each word sorted list of ''DocWithOccurences<IntWritable>'', where the document is identified by its number (contrary to the previous exercise, where ''Text'' was used to identify the document). |
| |
===== Exercise 2 ===== | ===== Exercise 3 ===== |
| |
Implement the [[.:step-15|K-means clustering exercise]] in Java. Instead of an controlling script, use the Java class itself to execute the Hadoop job as many times as necessary. | Implement the [[.:step-15|K-means clustering exercise]] in Java. Instead of an controlling script, use the Java class itself to execute the Hadoop job as many times as necessary. |