Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
courses:mapreduce-tutorial:step-31 [2012/02/06 08:50] straka |
courses:mapreduce-tutorial:step-31 [2012/02/06 13:10] straka |
It is crucial that all the mappers run simultaneously. This can be achieved using the ''/net/projects/hadoop/bin/compute-splitsize'' script: for given Hadoop input and requested number of mappers, it computes the appropriate splitsize. | It is crucial that all the mappers run simultaneously. This can be achieved using the ''/net/projects/hadoop/bin/compute-splitsize'' script: for given Hadoop input and requested number of mappers, it computes the appropriate splitsize. |
| |
When the computation finishes, only one of the mappers should print the results, as all of them have the same results. For simplicity, the ''cooperate'' method has ''boolean shouldWrite'' argument, which is set in exactly one mapper. | When the computation finishes, only one of the mappers should print the results, as all of them have the same results. For simplicity, the ''cooperate'' method has ''boolean writeResults'' argument, which is set in exactly one mapper. |
| |
===== Example ===== | ===== Example ===== |
You can run the example locally using: | You can run the example locally using: |
wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-31?codeblock=0' -O Sum.java | wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-31?codeblock=0' -O Sum.java |
make -f /net/projects/hadoop/java/Makefile Sum.java | make -f /net/projects/hadoop/java/Makefile Sum.jar |
rm -rf step-31-out; /net/projects/hadoop/bin/hadoop Sum.jar /net/projects/hadoop/examples/inputs/numbers-small step-31-out | rm -rf step-31-out; /net/projects/hadoop/bin/hadoop Sum.jar /net/projects/hadoop/examples/inputs/numbers-small step-31-out |
less step-31-out/part-* | less step-31-out/part-* |
* ''clusters.file'' -- file where to read the initial clusters from | * ''clusters.file'' -- file where to read the initial clusters from |
You can download and compile it using: | You can download and compile it using: |
wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-31-exercise3.txt' -O KMeans.java.java | wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-31-exercise3.txt' -O KMeans.java |
# NOW VIEW THE FILE | # NOW VIEW THE FILE |
# $EDITOR KMeans.java.java | # $EDITOR KMeans.java |
make -f /net/projects/hadoop/java/Makefile KMeans.java.java | make -f /net/projects/hadoop/java/Makefile KMeans.java |
You can run it using //C// machines on the following input data: | You can run it using specified number of machines on the following input data: |
* ''/net/projects/hadoop/examples/inputs/points-small'': <code>rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.java.jar -Dclusters.num=50 -Dclusters.file=/net/projects/hadoop/examples/inputs/points-small/points.txt -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/points-small C` /net/projects/hadoop/examples/inputs/points-small step-31-out</code> | * ''/net/projects/hadoop/examples/inputs/points-small'': |
* ''/net/projects/hadoop/examples/inputs/points-medium'': <code>rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.java.jar -Dclusters.num=100 -Dclusters.file=/net/projects/hadoop/examples/inputs/points-medium/points.txt -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/points-medium C` /net/projects/hadoop/examples/inputs/points-medium step-31-out</code> | <code>M=machines; K=50; INPUT=/net/projects/hadoop/examples/inputs/points-small/points.txt |
* ''/net/projects/hadoop/examples/inputs/points-large'': <code>rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.java.jar -Dclusters.num=200 -Dclusters.file=/net/projects/hadoop/examples/inputs/points-large/points.txt -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/points-large C` /net/projects/hadoop/examples/inputs/points-large step-31-out</code> | rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.jar -Dclusters.num=$K -Dclusters.file=$INPUT [-jt jobtracker | -c $M] `/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-31-out</code> |
| * ''/net/projects/hadoop/examples/inputs/points-medium'': |
| <code>M=machines; K=100; INPUT=/net/projects/hadoop/examples/inputs/points-medium/points.txt |
| rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.jar -Dclusters.num=$K -Dclusters.file=$INPUT [-jt jobtracker | -c $M] `/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-31-out</code> |
| * ''/net/projects/hadoop/examples/inputs/points-large'': |
| <code>M=machines; K=200; INPUT=/net/projects/hadoop/examples/inputs/points-large/points.txt |
| rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.jar -Dclusters.num=$K -Dclusters.file=$INPUT [-jt jobtracker | -c $M] `/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-31-out</code> |
| |
Solution: {{:courses:mapreduce-tutorial:step-31-solution3.txt|KMeans.java}}. | Solution: {{:courses:mapreduce-tutorial:step-31-solution3.txt|KMeans.java}}. |
| |