Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
courses:mapreduce-tutorial:step-31 [2012/02/06 08:40] straka |
courses:mapreduce-tutorial:step-31 [2012/02/06 08:50] straka |
===== Exercise 3 ===== | ===== Exercise 3 ===== |
| |
Implement an AllReduce job on ''/net/projects/hadoop/examples/inputs/numbers-small'', which computes | Implement an AllReduce job on ''/net/projects/hadoop/examples/inputs/points-small'', which implements the [[http://en.wikipedia.org/wiki/K-means_clustering#Standard_algorithm|K-means clustering algorithm]]. See [[.:step-15|K-means clustering exercise]] for description of input data. |
| |
You can download the template {{:courses:mapreduce-tutorial:step-31-exercise3.txt|Median.java}} and execute it using: | You can download the template {{:courses:mapreduce-tutorial:step-31-exercise3.txt|KMeans.java}}. This template uses two Hadoop properties: |
| * ''clusters.num'' -- number of clusters |
| * ''clusters.file'' -- file where to read the initial clusters from |
| You can download and compile it using: |
wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-31-exercise3.txt' -O KMeans.java.java | wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-31-exercise3.txt' -O KMeans.java.java |
# NOW VIEW THE FILE | # NOW VIEW THE FILE |
# $EDITOR KMeans.java.java | # $EDITOR KMeans.java.java |
make -f /net/projects/hadoop/java/Makefile KMeans.java.java | make -f /net/projects/hadoop/java/Makefile KMeans.java.java |
rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.java.jar -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` /net/projects/hadoop/examples/inputs/numbers-small step-31-out | You can run it using //C// machines on the following input data: |
less step-31-out/part-* | * ''/net/projects/hadoop/examples/inputs/points-small'': <code>rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.java.jar -Dclusters.num=50 -Dclusters.file=/net/projects/hadoop/examples/inputs/points-small/points.txt -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/points-small C` /net/projects/hadoop/examples/inputs/points-small step-31-out</code> |
| * ''/net/projects/hadoop/examples/inputs/points-medium'': <code>rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.java.jar -Dclusters.num=100 -Dclusters.file=/net/projects/hadoop/examples/inputs/points-medium/points.txt -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/points-medium C` /net/projects/hadoop/examples/inputs/points-medium step-31-out</code> |
| * ''/net/projects/hadoop/examples/inputs/points-large'': <code>rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.java.jar -Dclusters.num=200 -Dclusters.file=/net/projects/hadoop/examples/inputs/points-large/points.txt -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/points-large C` /net/projects/hadoop/examples/inputs/points-large step-31-out</code> |
| |
Solution: {{:courses:mapreduce-tutorial:step-31-solution3.txt|KMeans.java}}. | Solution: {{:courses:mapreduce-tutorial:step-31-solution3.txt|KMeans.java}}. |
| |