[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
courses:mapreduce-tutorial:step-25 [2012/01/30 15:23]
majlis
courses:mapreduce-tutorial:step-25 [2012/01/31 13:04]
straka
Line 78: Line 78:
   wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-25?codeblock=0' -O 'WordCount.java'   wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-25?codeblock=0' -O 'WordCount.java'
   make -f /net/projects/hadoop/java/Makefile WordCount.jar   make -f /net/projects/hadoop/java/Makefile WordCount.jar
-  rm -rf step-25-out-ex1; /net/projects/hadoop/bin/hadoop -r 0 WordCount.jar /home/straka/wiki/cs-text-small step-25-out-ex1+  rm -rf step-25-out-ex1; /net/projects/hadoop/bin/hadoop WordCount.jar -r 1 /home/straka/wiki/cs-text-small step-25-out-ex1
   less step-25-out-ex1/part-*   less step-25-out-ex1/part-*
  
Line 101: Line 101:
   * ''Reducer<Text, IntWritable, Text, NullWritable>'' which discards the value and outputs key only.   * ''Reducer<Text, IntWritable, Text, NullWritable>'' which discards the value and outputs key only.
 The solution is a bit clumsy. If the mapper could output (key, value, partition) instead of just (key, value), we would not have to use the ''value'' as a partition number and the types would be simplified. The solution is a bit clumsy. If the mapper could output (key, value, partition) instead of just (key, value), we would not have to use the ''value'' as a partition number and the types would be simplified.
 +
 +**Remark:** If the type of keys or values which the mapper outputs //is different// than the type of keys and values the reducer outputs, then [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#setMapOutputKeyClass(java.lang.Class)|job.setMapOutputKeyClass]] or [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#setMapOutputValueClass(java.lang.Class)|job.setMapOutputValueClass]] must be used. If they are not used, the type of keys and values produced by the mapper is expected to be the same as from the reducer.
  
 <file java ArticlesAndWords.java> <file java ArticlesAndWords.java>
Line 165: Line 167:
     job.setOutputKeyClass(Text.class);     job.setOutputKeyClass(Text.class);
     job.setOutputValueClass(NullWritable.class);     job.setOutputValueClass(NullWritable.class);
 +    job.setMapOutputValueClass(IntWritable.class);
  
     job.setInputFormatClass(KeyValueTextInputFormat.class);     job.setInputFormatClass(KeyValueTextInputFormat.class);
Line 182: Line 185:
 </file> </file>
  
-  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-25?codeblock=1' -O 'ArticlesAndWords.java'+  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-25?codeblock=3' -O 'ArticlesAndWords.java'
   make -f /net/projects/hadoop/java/Makefile ArticlesAndWords.jar   make -f /net/projects/hadoop/java/Makefile ArticlesAndWords.jar
-  rm -rf step-25-out-ex2; /net/projects/hadoop/bin/hadoop -r 0 ArticlesAndWords.jar /home/straka/wiki/cs-text-small step-25-out-ex2+  rm -rf step-25-out-ex2; /net/projects/hadoop/bin/hadoop ArticlesAndWords.jar -c 2 -r 2 /home/straka/wiki/cs-text-small step-25-out-ex2
   less step-25-out-ex2/part-*   less step-25-out-ex2/part-*
 +
 +===== Exercise =====
 +
 +Implement the [[.:step-13|sorting exercise]] in Java -- only the part with uniform keys.
 +
 +**Remark:** Values of type ''Text'' are sorted lexicographically, but values of type ''IntWritable'' are sorted according to value. Your mapper should therefore produce pairs of types (''IntWritable'', ''Text'').
 +
 +You can download the {{:courses:mapreduce-tutorial:step-25.txt|Sorting.java}} template and execute it.
 +
 +  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-25.txt' -O 'Sorting.java'
 +  # NOW VIEW THE FILE
 +  # $EDITOR Sorting.java
 +  make -f /net/projects/hadoop/java/Makefile Sorting.jar
 +  rm -rf step-25-out-sol; /net/projects/hadoop/bin/hadoop Sorting.jar -r 0 /home/straka/wiki/cs-text-small step-25-out-sol
 +  less step-25-out-sol/part-*
 +
 +----
 +
 +<html>
 +<table style="width:100%">
 +<tr>
 +<td style="text-align:left; width: 33%; "></html>[[step-24|Step 24]]: Mappers, running Java Hadoop jobs, counters.<html></td>
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
 +<td style="text-align:right; width: 33%; "></html>[[step-26|Step 26]]: Compression and job configuration.<html></td>
 +</tr>
 +</table>
 +</html>
  

[ Back to the navigation ] [ Back to the content ]