[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-28 [2012/01/29 16:38]
straka
courses:mapreduce-tutorial:step-28 [2012/01/31 13:12]
straka
Line 1: Line 1:
-====== MapReduce Tutorial : Running multiple Hadoop jobs in one class ======+====== MapReduce Tutorial : Running multiple Hadoop jobs in source file ======
  
-The Java API offers possibility to submit multiple Hadoop job in one class. A job can be submitted either using+The Java API offers possibility to submit multiple Hadoop job in one source file. A job can be submitted either using
   * [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#waitForCompletion(boolean)|job.waitForCompletion]] -- the job is submitted and the method waits for it to finish (successfully or not).   * [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#waitForCompletion(boolean)|job.waitForCompletion]] -- the job is submitted and the method waits for it to finish (successfully or not).
   * [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#submit()|job.submit]] -- the job is submitted and the method immediately returns. In this case, the state of the submitted job can be accessed using [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isComplete()|job.isComplete]] and [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isSuccessful()|job.isSuccessful]]   * [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#submit()|job.submit]] -- the job is submitted and the method immediately returns. In this case, the state of the submitted job can be accessed using [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isComplete()|job.isComplete]] and [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isSuccessful()|job.isSuccessful]]
  
 ===== Exercise 1 ===== ===== Exercise 1 =====
 +Improve the [[.:step-25#exercise|sorting exercise]] to handle [[.:step-13#nonuniform-data|nonuniform keys distribution]]. As in the [[.:step-13#nonuniform-data|Perl solution]], run two Hadoop jobs (using one Java source file) -- first samples the input and creates separator, second does the real sorting.
 +
 +===== Exercise 2 =====
 +
 +Improve the [[.:step-27#exercise|inverted index creation exercise]], such that
 +  - in the first job, create a list of unique document names. Number the documents using the order in this list.
 +  - in the second job, create for each word sorted list of ''DocWithOccurences<IntWritable>'', where the document is identified by its number (contrary to the previous exercise, where ''Text'' was used to identify the document).
 +
 +===== Exercise 3 =====
 +
 +Implement the [[.:step-15|K-means clustering exercise]] in Java. Instead of an controlling script, use the Java class itself to execute the Hadoop job as many times as necessary.
 +
 +
 +----
 +
 +<html>
 +<table style="width:100%">
 +<tr>
 +<td style="text-align:left; width: 33%; "></html>[[step-27|Step 27]]: Custom data types.<html></td>
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
 +<td style="text-align:right; width: 33%; "></html>[[step-29|Step 29]]: Custom sorting and grouping comparators.<html></td>
 +</tr>
 +</table>
 +</html>
  
-Improve the last [[.:step-27#exercise|inverted index creation exercise]] to 
-  - in first job, create a list of unique document names. Number the documents using the order in this list. 
-  - in second job, create for each word sorted list of ''DocWithOccurences<IntWritable>'', where the document is identified by its number (contrary to the previous exercise, where ''Text'' was used to identify the document). 

[ Back to the navigation ] [ Back to the content ]