[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

This is an old revision of the document!


Table of Contents

MapReduce Tutorial : Running multiple Hadoop jobs in one class

The Java API offers possibility to submit multiple Hadoop job in one class. A job can be submitted either using

Exercise 1

Improve the last inverted index creation exercise, such that

  1. in the first job, create a list of unique document names. Number the documents using the order in this list.
  2. in the second job, create for each word sorted list of DocWithOccurences<IntWritable>, where the document is identified by its number (contrary to the previous exercise, where Text was used to identify the document).

Exercise 2

Implement the K-means clustering exercise in Java. Instead of an controlling script, use the Java class itself to execute the Hadoop job as many times as necessary.


Step 27: Custom data types. Overview Step 29: Custom input formats.


[ Back to the navigation ] [ Back to the content ]