[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
courses:mapreduce-tutorial:step-16 [2012/02/06 09:25]
straka
courses:mapreduce-tutorial:step-16 [2012/02/06 13:29] (current)
straka
Line 75: Line 75:
    
 To run on a cluster with //C// machines using //C// mappers: To run on a cluster with //C// machines using //C// mappers:
-  rm -rf step-16-out; perl sum.pl -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` /net/projects/hadoop/examples/inputs/numbers-small step-16-out+  rm -rf step-16-out; M=#of_machines; INPUT=/net/projects/hadoop/examples/inputs/numbers-small; perl sum.pl -c $M `/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-16-out
   less step-16-out/part-*   less step-16-out/part-*
  
Line 90: Line 90:
   # NOW VIEW THE FILE   # NOW VIEW THE FILE
   # $EDITOR statistics.pl   # $EDITOR statistics.pl
-  rm -rf step-16-out; perl statistics.pl -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` /net/projects/hadoop/examples/inputs/numbers-small step-16-out+  rm -rf step-16-out; M=#of_machines; INPUT=/net/projects/hadoop/examples/inputs/numbers-small; perl statistics.pl -c $M `/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-16-out
   less step-16-out/part-*   less step-16-out/part-*
  
Line 108: Line 108:
   # NOW VIEW THE FILE   # NOW VIEW THE FILE
   # $EDITOR median.pl   # $EDITOR median.pl
-  rm -rf step-16-out; perl median.pl -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` /net/projects/hadoop/examples/inputs/numbers-small step-16-out+  rm -rf step-16-out; M=#of_machines; INPUT=/net/projects/hadoop/examples/inputs/numbers-small; perl median.pl -c $M `/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-16-out
   less step-16-out/part-*   less step-16-out/part-*
  
Line 126: Line 126:
 You can run it using specified number of machines on the following input data: You can run it using specified number of machines on the following input data:
   * ''/net/projects/hadoop/examples/inputs/points-small'':   * ''/net/projects/hadoop/examples/inputs/points-small'':
-<code>M=machines; export CLUSTERS_NUM=50; export CLUSTERS_FILE=/net/projects/hadoop/examples/inputs/points-small/points.txt +<code>M=#of_machines; export CLUSTERS_NUM=50 CLUSTERS_FILE=/net/projects/hadoop/examples/inputs/points-small/points.txt 
-rm -rf step-16-out; perl kmeans.pl [-jt jobtracker | -c $M`/net/projects/hadoop/bin/compute-splitsize $CLUSTERS_FILE $M` $CLUSTERS_FILE step-16-out</code>+rm -rf step-16-out; perl kmeans.pl -c $M `/net/projects/hadoop/bin/compute-splitsize $CLUSTERS_FILE $M` $CLUSTERS_FILE step-16-out</code> 
 +  * ''/net/projects/hadoop/examples/inputs/points-medium'': 
 +<code>M=#of_machines; export CLUSTERS_NUM=100 CLUSTERS_FILE=/net/projects/hadoop/examples/inputs/points-medium/points.txt 
 +rm -rf step-16-out; perl kmeans.pl -c $M `/net/projects/hadoop/bin/compute-splitsize $CLUSTERS_FILE $M` $CLUSTERS_FILE step-16-out</code> 
 +  * ''/net/projects/hadoop/examples/inputs/points-large'': 
 +<code>M=#of_machines; export CLUSTERS_NUM=200 CLUSTERS_FILE=/net/projects/hadoop/examples/inputs/points-large/points.txt 
 +rm -rf step-16-out; perl kmeans.pl -c $M `/net/projects/hadoop/bin/compute-splitsize $CLUSTERS_FILE $M` $CLUSTERS_FILE step-16-out</code>
  
-Solution: {{:courses:mapreduce-tutorial:step-16-solution3.txt|kmeans.pl}}.+Solution: {{:courses:mapreduce-tutorial:step-16-solution3.txt|kmeans.pl}}, much faster solution with distance computations written in C: {{:courses:mapreduce-tutorial:step-16-solution3_c.txt|kmeans_C.pl}}.
  

[ Back to the navigation ] [ Back to the content ]