Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-28 [2012/01/28 17:54]
straka vytvořeno
+++ courses:mapreduce-tutorial:step-28 [2012/01/31 13:10]
straka
@@ Line 1: / Line 1: @@
-====== MapReduce Tutorial : Custom input formats ======
+====== MapReduce Tutorial : Running multiple Hadoop jobs in one class ======
+The Java API offers possibility to submit multiple Hadoop job in one class. A job can be submitted either using
+  * [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#waitForCompletion(boolean)|job.waitForCompletion]] -- the job is submitted and the method waits for it to finish (successfully or not).
+  * [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#submit()|job.submit]] -- the job is submitted and the method immediately returns. In this case, the state of the submitted job can be accessed using [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isComplete()|job.isComplete]] and [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isSuccessful()|job.isSuccessful]]
+===== Exercise 1 =====
+Improve the [[.:step-25#exercise|sorting exercise]] to handle [[.:step-13#nonuniform-data|nonuniform keys distribution]]. As in the [[.:step-13#nonuniform-data|Perl solution]], run two Hadoop jobs (using one Java source file) -- first samples the input and creates separator, second does the real sorting.
+===== Exercise 2 =====
+Improve the [[.:step-27#exercise|inverted index creation exercise]], such that
+  - in the first job, create a list of unique document names. Number the documents using the order in this list.
+  - in the second job, create for each word sorted list of ''DocWithOccurences<IntWritable>'', where the document is identified by its number (contrary to the previous exercise, where ''Text'' was used to identify the document).
+===== Exercise 3 =====
+Implement the [[.:step-15|K-means clustering exercise]] in Java. Instead of an controlling script, use the Java class itself to execute the Hadoop job as many times as necessary.
+----
+<html>
+<table style="width:100%">
+<tr>
+<td style="text-align:left; width: 33%; "></html>[[step-27|Step 27]]: Custom data types.<html></td>
+<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
+<td style="text-align:right; width: 33%; "></html>[[step-29|Step 29]]: Custom sorting and grouping comparators.<html></td>
+</tr>
+</table>
+</html>
- WholeFile and WholeFileAsPath

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences