[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-28 [2012/01/29 16:34]
straka
courses:mapreduce-tutorial:step-28 [2012/01/31 13:07]
straka
Line 5: Line 5:
   * [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#submit()|job.submit]] -- the job is submitted and the method immediately returns. In this case, the state of the submitted job can be accessed using [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isComplete()|job.isComplete]] and [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isSuccessful()|job.isSuccessful]]   * [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#submit()|job.submit]] -- the job is submitted and the method immediately returns. In this case, the state of the submitted job can be accessed using [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isComplete()|job.isComplete]] and [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isSuccessful()|job.isSuccessful]]
  
 +===== Exercise 2 =====
 +
 +Improve the last [[.:step-27#exercise|inverted index creation exercise]], such that
 +  - in the first job, create a list of unique document names. Number the documents using the order in this list.
 +  - in the second job, create for each word sorted list of ''DocWithOccurences<IntWritable>'', where the document is identified by its number (contrary to the previous exercise, where ''Text'' was used to identify the document).
 +
 +===== Exercise 3 =====
 +
 +Implement the [[.:step-15|K-means clustering exercise]] in Java. Instead of an controlling script, use the Java class itself to execute the Hadoop job as many times as necessary.
 +
 +
 +----
 +
 +<html>
 +<table style="width:100%">
 +<tr>
 +<td style="text-align:left; width: 33%; "></html>[[step-27|Step 27]]: Custom data types.<html></td>
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
 +<td style="text-align:right; width: 33%; "></html>[[step-29|Step 29]]: Custom sorting and grouping comparators.<html></td>
 +</tr>
 +</table>
 +</html>
  

[ Back to the navigation ] [ Back to the content ]