Both sides previous revision
Previous revision
Next revision
|
Previous revision
|
courses:mapreduce-tutorial:step-16 [2012/02/06 09:25] straka |
courses:mapreduce-tutorial:step-16 [2012/02/06 13:29] (current) straka |
| |
To run on a cluster with //C// machines using //C// mappers: | To run on a cluster with //C// machines using //C// mappers: |
rm -rf step-16-out; perl sum.pl -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` /net/projects/hadoop/examples/inputs/numbers-small step-16-out | rm -rf step-16-out; M=#of_machines; INPUT=/net/projects/hadoop/examples/inputs/numbers-small; perl sum.pl -c $M `/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-16-out |
less step-16-out/part-* | less step-16-out/part-* |
| |
# NOW VIEW THE FILE | # NOW VIEW THE FILE |
# $EDITOR statistics.pl | # $EDITOR statistics.pl |
rm -rf step-16-out; perl statistics.pl -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` /net/projects/hadoop/examples/inputs/numbers-small step-16-out | rm -rf step-16-out; M=#of_machines; INPUT=/net/projects/hadoop/examples/inputs/numbers-small; perl statistics.pl -c $M `/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-16-out |
less step-16-out/part-* | less step-16-out/part-* |
| |
# NOW VIEW THE FILE | # NOW VIEW THE FILE |
# $EDITOR median.pl | # $EDITOR median.pl |
rm -rf step-16-out; perl median.pl -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` /net/projects/hadoop/examples/inputs/numbers-small step-16-out | rm -rf step-16-out; M=#of_machines; INPUT=/net/projects/hadoop/examples/inputs/numbers-small; perl median.pl -c $M `/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-16-out |
less step-16-out/part-* | less step-16-out/part-* |
| |
You can run it using specified number of machines on the following input data: | You can run it using specified number of machines on the following input data: |
* ''/net/projects/hadoop/examples/inputs/points-small'': | * ''/net/projects/hadoop/examples/inputs/points-small'': |
<code>M=machines; export CLUSTERS_NUM=50; export CLUSTERS_FILE=/net/projects/hadoop/examples/inputs/points-small/points.txt | <code>M=#of_machines; export CLUSTERS_NUM=50 CLUSTERS_FILE=/net/projects/hadoop/examples/inputs/points-small/points.txt |
rm -rf step-16-out; perl kmeans.pl [-jt jobtracker | -c $M] `/net/projects/hadoop/bin/compute-splitsize $CLUSTERS_FILE $M` $CLUSTERS_FILE step-16-out</code> | rm -rf step-16-out; perl kmeans.pl -c $M `/net/projects/hadoop/bin/compute-splitsize $CLUSTERS_FILE $M` $CLUSTERS_FILE step-16-out</code> |
| * ''/net/projects/hadoop/examples/inputs/points-medium'': |
| <code>M=#of_machines; export CLUSTERS_NUM=100 CLUSTERS_FILE=/net/projects/hadoop/examples/inputs/points-medium/points.txt |
| rm -rf step-16-out; perl kmeans.pl -c $M `/net/projects/hadoop/bin/compute-splitsize $CLUSTERS_FILE $M` $CLUSTERS_FILE step-16-out</code> |
| * ''/net/projects/hadoop/examples/inputs/points-large'': |
| <code>M=#of_machines; export CLUSTERS_NUM=200 CLUSTERS_FILE=/net/projects/hadoop/examples/inputs/points-large/points.txt |
| rm -rf step-16-out; perl kmeans.pl -c $M `/net/projects/hadoop/bin/compute-splitsize $CLUSTERS_FILE $M` $CLUSTERS_FILE step-16-out</code> |
| |
Solution: {{:courses:mapreduce-tutorial:step-16-solution3.txt|kmeans.pl}}. | Solution: {{:courses:mapreduce-tutorial:step-16-solution3.txt|kmeans.pl}}, much faster solution with distance computations written in C: {{:courses:mapreduce-tutorial:step-16-solution3_c.txt|kmeans_C.pl}}. |
| |