Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-25 [2012/01/30 15:23]
majlis
+++ courses:mapreduce-tutorial:step-25 [2012/01/31 13:04]
straka
@@ Line 78: / Line 78: @@
   wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-25?codeblock=0' -O 'WordCount.java'
   make -f /net/projects/hadoop/java/Makefile WordCount.jar
-  rm -rf step-25-out-ex1; /net/projects/hadoop/bin/hadoop -r 0 WordCount.jar /home/straka/wiki/cs-text-small step-25-out-ex1
+  rm -rf step-25-out-ex1; /net/projects/hadoop/bin/hadoop WordCount.jar -r 1 /home/straka/wiki/cs-text-small step-25-out-ex1
   less step-25-out-ex1/part-*
@@ Line 101: / Line 101: @@
   * ''Reducer<Text, IntWritable, Text, NullWritable>'' which discards the value and outputs key only.
 The solution is a bit clumsy. If the mapper could output (key, value, partition) instead of just (key, value), we would not have to use the ''value'' as a partition number and the types would be simplified.
+**Remark:** If the type of keys or values which the mapper outputs //is different// than the type of keys and values the reducer outputs, then [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#setMapOutputKeyClass(java.lang.Class)|job.setMapOutputKeyClass]] or [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#setMapOutputValueClass(java.lang.Class)|job.setMapOutputValueClass]] must be used. If they are not used, the type of keys and values produced by the mapper is expected to be the same as from the reducer.
 <file java ArticlesAndWords.java>
@@ Line 165: / Line 167: @@
     job.setOutputKeyClass(Text.class);
     job.setOutputValueClass(NullWritable.class);
+    job.setMapOutputValueClass(IntWritable.class);
     job.setInputFormatClass(KeyValueTextInputFormat.class);
@@ Line 182: / Line 185: @@
 </file>
-  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-25?codeblock=1' -O 'ArticlesAndWords.java'
+  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-25?codeblock=3' -O 'ArticlesAndWords.java'
   make -f /net/projects/hadoop/java/Makefile ArticlesAndWords.jar
-  rm -rf step-25-out-ex2; /net/projects/hadoop/bin/hadoop -r 0 ArticlesAndWords.jar /home/straka/wiki/cs-text-small step-25-out-ex2
+  rm -rf step-25-out-ex2; /net/projects/hadoop/bin/hadoop ArticlesAndWords.jar -c 2 -r 2 /home/straka/wiki/cs-text-small step-25-out-ex2
   less step-25-out-ex2/part-*
+===== Exercise =====
+Implement the [[.:step-13|sorting exercise]] in Java -- only the part with uniform keys.
+**Remark:** Values of type ''Text'' are sorted lexicographically, but values of type ''IntWritable'' are sorted according to value. Your mapper should therefore produce pairs of types (''IntWritable'', ''Text'').
+You can download the {{:courses:mapreduce-tutorial:step-25.txt|Sorting.java}} template and execute it.
+  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-25.txt' -O 'Sorting.java'
+  # NOW VIEW THE FILE
+  # $EDITOR Sorting.java
+  make -f /net/projects/hadoop/java/Makefile Sorting.jar
+  rm -rf step-25-out-sol; /net/projects/hadoop/bin/hadoop Sorting.jar -r 0 /home/straka/wiki/cs-text-small step-25-out-sol
+  less step-25-out-sol/part-*
+----
+<html>
+<table style="width:100%">
+<tr>
+<td style="text-align:left; width: 33%; "></html>[[step-24|Step 24]]: Mappers, running Java Hadoop jobs, counters.<html></td>
+<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
+<td style="text-align:right; width: 33%; "></html>[[step-26|Step 26]]: Compression and job configuration.<html></td>
+</tr>
+</table>
+</html>

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences