Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-24 [2012/01/31 11:27]
straka
+++ courses:mapreduce-tutorial:step-24 [2012/01/31 16:25] (current)
dusek
@@ Line 92: / Line 92: @@
   * ''net/projects/hadoop/bin/hadoop job.jar -c number_of_machines [-w secs_to_wait_after_job_finishes] [-r number_of_reducers] [-Dname=value -Dname=value ...] input output_path'' -- creates a new cluster with specified number of machines, which executes given job, and then waits for specified number of seconds before it stops.
-===== Exercise =====
+===== Exercise 1 =====
 Download the ''MapperOnlyHadoopJob.java'', compile it and run it using
   wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-24?codeblock=1' -O 'MapperOnlyHadoopJob.java'
@@ Line 105: / Line 105: @@
 ===== Counters =====
-As in the Perl API, a mapper or a reducer can increment various counters by using ''context.getCounter("Group", "Name").increment(value)'':
+As in the Perl API, a mapper (or a reducer) can increment various counters by using ''context.getCounter("Group", "Name").increment(value)'':
 <code java>
 public void map(Text key, Text value, Context context) throws IOException, InterruptedException {
@@ Line 115: / Line 115: @@
 The ''getCounter'' method returns a [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Counter.html|Counter]] object, so if a counter is incremented frequently, the ''getCounter'' method can be called only once:
 <code java>
-public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
+public void map(Text key, Text value, Context context) throws IOException, InterruptedException {
   ...
-  Counter values = context.getCounter("Reducer", "Number of values");
+  Counter words = context.getCounter("Mapper", "Number of words");
-  for (IntWritable value : values) {
+  for (String word : value.toString().split("\\W+")) {
     ...
-    values.increment(1);
+    words.increment(1);
   }
 }
 </code>
+===== Example 2 =====
+Run a Hadoop job on /home/straka/wiki/cs-text-small, which filters the documents so that only three-letter words remain. Also use counters to count the histogram of words lengths and to compute the percentage of three letter words in the documents. You can download the template {{:courses:mapreduce-tutorial:step-24.txt|ThreeLetterWords.java}} and execute it.
+  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-24.txt' -O 'ThreeLetterWords.java'
+  # NOW VIEW THE FILE
+  # $EDITOR ThreeLetterWords.java
+  make -f /net/projects/hadoop/java/Makefile ThreeLetterWords.jar
+  rm -rf step-24-out-sol; /net/projects/hadoop/bin/hadoop ThreeLetterWords.jar -r 0 /home/straka/wiki/cs-text-small step-24-out-sol
+  less step-24-out-sol/part-*
 ----

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences