Both sides previous revision
Previous revision
Next revision
|
Previous revision
|
courses:mapreduce-tutorial:step-25 [2012/01/30 15:40] majlis |
courses:mapreduce-tutorial:step-25 [2012/01/31 15:12] (current) majlis Fixed code for sorting execution. |
wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-25?codeblock=0' -O 'WordCount.java' | wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-25?codeblock=0' -O 'WordCount.java' |
make -f /net/projects/hadoop/java/Makefile WordCount.jar | make -f /net/projects/hadoop/java/Makefile WordCount.jar |
rm -rf step-25-out-ex1; /net/projects/hadoop/bin/hadoop -r 0 WordCount.jar /home/straka/wiki/cs-text-small step-25-out-ex1 | rm -rf step-25-out-ex1; /net/projects/hadoop/bin/hadoop WordCount.jar -r 1 /home/straka/wiki/cs-text-small step-25-out-ex1 |
less step-25-out-ex1/part-* | less step-25-out-ex1/part-* |
| |
* ''Reducer<Text, IntWritable, Text, NullWritable>'' which discards the value and outputs key only. | * ''Reducer<Text, IntWritable, Text, NullWritable>'' which discards the value and outputs key only. |
The solution is a bit clumsy. If the mapper could output (key, value, partition) instead of just (key, value), we would not have to use the ''value'' as a partition number and the types would be simplified. | The solution is a bit clumsy. If the mapper could output (key, value, partition) instead of just (key, value), we would not have to use the ''value'' as a partition number and the types would be simplified. |
| |
| **Remark:** If the type of keys or values which the mapper outputs //is different// than the type of keys and values the reducer outputs, then [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#setMapOutputKeyClass(java.lang.Class)|job.setMapOutputKeyClass]] or [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html#setMapOutputValueClass(java.lang.Class)|job.setMapOutputValueClass]] must be used. If they are not used, the type of keys and values produced by the mapper is expected to be the same as from the reducer. |
| |
<file java ArticlesAndWords.java> | <file java ArticlesAndWords.java> |
job.setOutputKeyClass(Text.class); | job.setOutputKeyClass(Text.class); |
job.setOutputValueClass(NullWritable.class); | job.setOutputValueClass(NullWritable.class); |
| job.setMapOutputValueClass(IntWritable.class); |
| |
job.setInputFormatClass(KeyValueTextInputFormat.class); | job.setInputFormatClass(KeyValueTextInputFormat.class); |
wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-25?codeblock=3' -O 'ArticlesAndWords.java' | wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-25?codeblock=3' -O 'ArticlesAndWords.java' |
make -f /net/projects/hadoop/java/Makefile ArticlesAndWords.jar | make -f /net/projects/hadoop/java/Makefile ArticlesAndWords.jar |
rm -rf step-25-out-ex2; /net/projects/hadoop/bin/hadoop -r 0 ArticlesAndWords.jar /home/straka/wiki/cs-text-small step-25-out-ex2 | rm -rf step-25-out-ex2; /net/projects/hadoop/bin/hadoop ArticlesAndWords.jar -c 2 -r 2 /home/straka/wiki/cs-text-small step-25-out-ex2 |
less step-25-out-ex2/part-* | less step-25-out-ex2/part-* |
| |
| ===== Exercise ===== |
| |
| Implement the [[.:step-13|sorting exercise]] in Java -- only the part with uniform keys. |
| |
| **Remark:** Values of type ''Text'' are sorted lexicographically, but values of type ''IntWritable'' are sorted according to value. Your mapper should therefore produce pairs of types (''IntWritable'', ''Text''). |
| |
| You can download the {{:courses:mapreduce-tutorial:step-25.txt|Sorting.java}} template and execute it. |
| |
| wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-25.txt' -O 'SortingUniform.java' |
| # NOW VIEW THE FILE |
| # $EDITOR SortingUniform.java |
| make -f /net/projects/hadoop/java/Makefile SortingUniform.jar |
| rm -rf step-25-out-uniform; /net/projects/hadoop/bin/hadoop SortingUniform.jar -c 2 -r 2 /net/projects/hadoop/examples/inputs/numbers-small step-25-out-uniform |
| less step-25-out-uniform/part-* |
| |
| wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-25.txt' -O 'SortingNonuniform.java' |
| # NOW VIEW THE FILE |
| # $EDITOR SortingUniform.java |
| make -f /net/projects/hadoop/java/Makefile SortingNonuniform.jar |
| rm -rf step-25-out-nonuniform; /net/projects/hadoop/bin/hadoop SortingNonuniform.jar -c 2 -r 2 /net/projects/hadoop/examples/inputs/nonuniform-small step-25-out-nonuniform |
| less step-25-out-nonuniform/part-* |
| |
---- | ---- |
<table style="width:100%"> | <table style="width:100%"> |
<tr> | <tr> |
<td style="text-align:left; width: 33%; "></html>[[step-24|Step 24]]: Mappers, running Java Hadoop jobs.<html></td> | <td style="text-align:left; width: 33%; "></html>[[step-24|Step 24]]: Mappers, running Java Hadoop jobs, counters.<html></td> |
<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> | <td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> |
<td style="text-align:right; width: 33%; "></html>[[step-26|Step 26]]: Counters, compression and job configuration.<html></td> | <td style="text-align:right; width: 33%; "></html>[[step-26|Step 26]]: Compression and job configuration.<html></td> |
</tr> | </tr> |
</table> | </table> |
</html> | </html> |
| |