[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-28 [2012/01/28 17:54]
straka vytvořeno
courses:mapreduce-tutorial:step-28 [2012/01/29 16:38]
straka
Line 1: Line 1:
-====== MapReduce Tutorial : Custom input formats ======+====== MapReduce Tutorial : Running multiple Hadoop jobs in one class ====== 
 + 
 +The Java API offers possibility to submit multiple Hadoop job in one class. A job can be submitted either using 
 +  * [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#waitForCompletion(boolean)|job.waitForCompletion]] -- the job is submitted and the method waits for it to finish (successfully or not). 
 +  * [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#submit()|job.submit]] -- the job is submitted and the method immediately returns. In this case, the state of the submitted job can be accessed using [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isComplete()|job.isComplete]] and [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#isSuccessful()|job.isSuccessful]] 
 + 
 +===== Exercise 1 ===== 
 + 
 +Improve the last [[.:step-27#exercise|inverted index creation exercise]], such that 
 +  - in the first job, create a list of unique document names. Number the documents using the order in this list. 
 +  - in the second job, create for each word sorted list of ''DocWithOccurences<IntWritable>'', where the document is identified by its number (contrary to the previous exercise, where ''Text'' was used to identify the document). 
 + 
 +===== Exercise 2 =====
  
- WholeFile and WholeFileAsPath 

[ Back to the navigation ] [ Back to the content ]