[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
courses:mapreduce-tutorial:step-27 [2012/01/28 17:53]
straka
courses:mapreduce-tutorial:step-27 [2012/01/31 14:34]
straka
Line 1: Line 1:
-====== MapReduce Tutorial : Custom data types ======+====== MapReduce Tutorial : Running multiple Hadoop jobs in one source file ======
  
-An important feature of the Java API is that custom data and format types can be provided. In this step we implement two custom data types.+The Java API offers possibility to submit multiple Hadoop job in one source file. A job can be submitted either using 
 +  * [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#waitForCompletion(boolean)|job.waitForCompletion]] -- the job is submitted and the method waits for it to finish (successfully or not). 
 +  * [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#submit()|job.submit]] -- the job is submitted and the method immediately returns. In this case, the state of the submitted job can be accessed using [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isComplete()|job.isComplete]] and [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isSuccessful()|job.isSuccessful]]
  
-===== BerIntWritable =====+===== Exercise 1 ===== 
 +Improve the [[.:step-25#exercise|sorting exercise]] to handle [[.:step-13#nonuniform-data|nonuniform keys distribution]]. As in the [[.:step-13#nonuniform-data|Perl solution]], run two Hadoop jobs (using one Java source file) -- first samples the input and creates separator, second does the real sorting.
  
-===== PairWritable<AB> =====+===== Exercise 2 ===== 
 + 
 +Implement the [[.:step-15|K-means clustering exercise]] in Java. Instead of an controlling scriptuse the Java class itself to execute the Hadoop job as many times as necessary. 
 + 
 + 
 +---- 
 + 
 +<html> 
 +<table style="width:100%"> 
 +<tr> 
 +<td style="text-align:left; width: 33%; "></html>[[step-26|Step 26]]: Compression and job configuration.<html></td> 
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> 
 +<td style="text-align:right; width: 33%; "></html>[[step-28|Step 28]]: Running multiple Hadoop jobs in one source file.<html></td> 
 +</tr> 
 +</table> 
 +</html>
  

[ Back to the navigation ] [ Back to the content ]