[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
courses:mapreduce-tutorial:step-27 [2012/01/28 17:53]
straka
courses:mapreduce-tutorial:step-27 [2012/01/31 14:39] (current)
straka
Line 1: Line 1:
-====== MapReduce Tutorial : Custom data types ======+====== MapReduce Tutorial : Running multiple Hadoop jobs in one source file ======
  
-An important feature of the Java API is that custom data and format types can be provided. In this step we implement two custom data types.+The Java API offers possibility to submit multiple Hadoop job in one source file. A job can be submitted either using 
 +  * [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#waitForCompletion(boolean)|job.waitForCompletion]] -- the job is submitted and the method waits for it to finish (successfully or not). 
 +  * [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#submit()|job.submit]] -- the job is submitted and the method immediately returns. In this case, the state of the submitted job can be accessed using [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isComplete()|job.isComplete]] and [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isSuccessful()|job.isSuccessful]]
  
-===== BerIntWritable =====+===== Exercise 1 ===== 
 +Improve the [[.:step-25#exercise|sorting exercise]] to handle [[.:step-13#nonuniform-data|nonuniform keys distribution]]. As in the [[.:step-13#nonuniform-data|Perl solution]], run two Hadoop jobs (using one Java source file) -- first samples the input and creates separator, second does the real sorting.
  
-===== PairWritable<AB> =====+===== Exercise 2 ===== 
 + 
 +Implement the [[.:step-15|K-means clustering exercise]] in Java. Instead of an controlling scriptuse the Java class itself to execute the Hadoop job as many times as necessary. 
 + 
 + 
 +---- 
 + 
 +<html> 
 +<table style="width:100%"> 
 +<tr> 
 +<td style="text-align:left; width: 33%; "></html>[[step-26|Step 26]]: Compression and job configuration.<html></td> 
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> 
 +<td style="text-align:right; width: 33%; "></html>[[step-28|Step 28]]: Custom data types.<html></td> 
 +</tr> 
 +</table> 
 +</html>
  

[ Back to the navigation ] [ Back to the content ]