[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-28 [2012/01/28 21:42]
straka
courses:mapreduce-tutorial:step-28 [2012/01/31 13:07]
straka
Line 1: Line 1:
-====== MapReduce Tutorial : Custom input formats ======+====== MapReduce Tutorial : Running multiple Hadoop jobs in one class ====== 
 + 
 +The Java API offers possibility to submit multiple Hadoop job in one class. A job can be submitted either using 
 +  * [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#waitForCompletion(boolean)|job.waitForCompletion]] -- the job is submitted and the method waits for it to finish (successfully or not). 
 +  * [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#submit()|job.submit]] -- the job is submitted and the method immediately returns. In this case, the state of the submitted job can be accessed using [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isComplete()|job.isComplete]] and [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isSuccessful()|job.isSuccessful]] 
 + 
 +===== Exercise 2 ===== 
 + 
 +Improve the last [[.:step-27#exercise|inverted index creation exercise]], such that 
 +  - in the first job, create a list of unique document names. Number the documents using the order in this list. 
 +  - in the second job, create for each word sorted list of ''DocWithOccurences<IntWritable>'', where the document is identified by its number (contrary to the previous exercise, where ''Text'' was used to identify the document). 
 + 
 +===== Exercise 3 ===== 
 + 
 +Implement the [[.:step-15|K-means clustering exercise]] in Java. Instead of an controlling script, use the Java class itself to execute the Hadoop job as many times as necessary. 
 + 
 + 
 +---- 
 + 
 +<html> 
 +<table style="width:100%"> 
 +<tr> 
 +<td style="text-align:left; width: 33%; "></html>[[step-27|Step 27]]: Custom data types.<html></td> 
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> 
 +<td style="text-align:right; width: 33%; "></html>[[step-29|Step 29]]: Custom sorting and grouping comparators.<html></td> 
 +</tr> 
 +</table> 
 +</html>
  
-WholeFile 
-FileAsPath 
-ParagraphFile 

[ Back to the navigation ] [ Back to the content ]