[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
courses:mapreduce-tutorial:step-31 [2012/02/06 13:10]
straka
courses:mapreduce-tutorial:step-31 [2012/02/06 13:26]
straka
Line 37: Line 37:
 This example reads the keys of ''/net/projects/hadoop/examples/inputs/numbers-small'', computes the sum of all the keys and print it: This example reads the keys of ''/net/projects/hadoop/examples/inputs/numbers-small'', computes the sum of all the keys and print it:
 <code java Sum.java> <code java Sum.java>
 +import java.io.IOException;
 +
 +import org.apache.hadoop.conf.*;
 +import org.apache.hadoop.fs.Path;
 +import org.apache.hadoop.io.*;
 import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.*;
 import org.apache.hadoop.mapreduce.lib.allreduce.*; import org.apache.hadoop.mapreduce.lib.allreduce.*;
Line 106: Line 111:
   less step-31-out/part-*   less step-31-out/part-*
    
-To run on a cluster with //C// machines using //C// mappers+To run on a cluster using specified number of machines: 
-  rm -rf step-31-out; /net/projects/hadoop/bin/hadoop Sum.jar -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` /net/projects/hadoop/examples/inputs/numbers-small step-31-out+  rm -rf step-31-out; M=#of_machines; INPUT=/net/projects/hadoop/examples/inputs/numbers-small; /net/projects/hadoop/bin/hadoop Sum.jar -c $M `/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-31-out
   less step-31-out/part-*   less step-31-out/part-*
  
Line 123: Line 128:
   # $EDITOR Statistics.java   # $EDITOR Statistics.java
   make -f /net/projects/hadoop/java/Makefile Statistics.jar   make -f /net/projects/hadoop/java/Makefile Statistics.jar
-  rm -rf step-31-out; /net/projects/hadoop/bin/hadoop Statistics.jar -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` /net/projects/hadoop/examples/inputs/numbers-small step-31-out+  rm -rf step-31-out; M=#of_machines; INPUT=/net/projects/hadoop/examples/inputs/numbers-small; /net/projects/hadoop/bin/hadoop Statistics.jar -c $M `/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-31-out
   less step-31-out/part-*   less step-31-out/part-*
  
Line 142: Line 147:
   # $EDITOR Median.java   # $EDITOR Median.java
   make -f /net/projects/hadoop/java/Makefile Median.jar   make -f /net/projects/hadoop/java/Makefile Median.jar
-  rm -rf step-31-out; /net/projects/hadoop/bin/hadoop Median.jar -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` /net/projects/hadoop/examples/inputs/numbers-small step-31-out+  rm -rf step-31-out; M=#of_machines; INPUT=/net/projects/hadoop/examples/inputs/numbers-small; /net/projects/hadoop/bin/hadoop Median.jar -c $M `/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-31-out
   less step-31-out/part-*   less step-31-out/part-*
  
Line 158: Line 163:
   # NOW VIEW THE FILE   # NOW VIEW THE FILE
   # $EDITOR KMeans.java   # $EDITOR KMeans.java
-  make -f /net/projects/hadoop/java/Makefile KMeans.java+  make -f /net/projects/hadoop/java/Makefile KMeans.jar
 You can run it using specified number of machines on the following input data: You can run it using specified number of machines on the following input data:
   * ''/net/projects/hadoop/examples/inputs/points-small'':   * ''/net/projects/hadoop/examples/inputs/points-small'':
-<code>M=machines; K=50; INPUT=/net/projects/hadoop/examples/inputs/points-small/points.txt +<code>M=#of_machines; K=50; INPUT=/net/projects/hadoop/examples/inputs/points-small/points.txt 
-rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.jar -Dclusters.num=$K -Dclusters.file=$INPUT [-jt jobtracker | -c $M`/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-31-out</code>+rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.jar -Dclusters.num=$K -Dclusters.file=$INPUT -c $M `/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-31-out</code>
   * ''/net/projects/hadoop/examples/inputs/points-medium'':   * ''/net/projects/hadoop/examples/inputs/points-medium'':
-<code>M=machines; K=100; INPUT=/net/projects/hadoop/examples/inputs/points-medium/points.txt +<code>M=#of_machines; K=100; INPUT=/net/projects/hadoop/examples/inputs/points-medium/points.txt 
-rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.jar -Dclusters.num=$K -Dclusters.file=$INPUT [-jt jobtracker | -c $M`/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-31-out</code>+rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.jar -Dclusters.num=$K -Dclusters.file=$INPUT -c $M `/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-31-out</code>
   * ''/net/projects/hadoop/examples/inputs/points-large'':   * ''/net/projects/hadoop/examples/inputs/points-large'':
-<code>M=machines; K=200; INPUT=/net/projects/hadoop/examples/inputs/points-large/points.txt +<code>M=#of_machines; K=200; INPUT=/net/projects/hadoop/examples/inputs/points-large/points.txt 
-rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.jar -Dclusters.num=$K -Dclusters.file=$INPUT [-jt jobtracker | -c $M`/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-31-out</code>+rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.jar -Dclusters.num=$K -Dclusters.file=$INPUT -c $M `/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-31-out</code>
  
 Solution: {{:courses:mapreduce-tutorial:step-31-solution3.txt|KMeans.java}}. Solution: {{:courses:mapreduce-tutorial:step-31-solution3.txt|KMeans.java}}.
  

[ Back to the navigation ] [ Back to the content ]