[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-31 [2012/02/06 08:41]
straka
courses:mapreduce-tutorial:step-31 [2012/02/06 08:55]
straka
Line 149: Line 149:
 ===== Exercise 3 ===== ===== Exercise 3 =====
  
-Implement an AllReduce job on ''/net/projects/hadoop/examples/inputs/numbers-small'', which computes+Implement an AllReduce job on ''/net/projects/hadoop/examples/inputs/points-small'', which implements the [[http://en.wikipedia.org/wiki/K-means_clustering#Standard_algorithm|K-means clustering algorithm]]. See [[.:step-15|K-means clustering exercise]] for description of input data.
  
-You can download the template {{:courses:mapreduce-tutorial:step-31-exercise3.txt|KMeans.java}} and execute it using:+You can download the template {{:courses:mapreduce-tutorial:step-31-exercise3.txt|KMeans.java}}. This template uses two Hadoop properties: 
 +  * ''clusters.num'' -- number of clusters 
 +  * ''clusters.file'' -- file where to read the initial clusters from 
 +You can download and compile it using:
   wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-31-exercise3.txt' -O KMeans.java.java   wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-31-exercise3.txt' -O KMeans.java.java
   # NOW VIEW THE FILE   # NOW VIEW THE FILE
   # $EDITOR KMeans.java.java   # $EDITOR KMeans.java.java
   make -f /net/projects/hadoop/java/Makefile KMeans.java.java   make -f /net/projects/hadoop/java/Makefile KMeans.java.java
-  rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.java.jar -c `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` /net/projects/hadoop/examples/inputs/numbers-small step-31-out +You can run it using specified number of machines on the following input data: 
-  less step-31-out/part-*+  * ''/net/projects/hadoop/examples/inputs/points-small'': 
 +<code>M=machines; K=50; INPUT=/net/projects/hadoop/examples/inputs/points-small/points.txt 
 +rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.java.jar -Dclusters.num=$K -Dclusters.file=$INPUT [-jt jobtracker | -c $M] `/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-31-out</code> 
 +  * ''/net/projects/hadoop/examples/inputs/points-medium'': 
 +<code>M=machines; K=100; INPUT=/net/projects/hadoop/examples/inputs/points-medium/points.txt 
 +rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.java.jar -Dclusters.num=$K -Dclusters.file=$INPUT [-jt jobtracker | -c $M] `/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-31-out</code> 
 +  * ''/net/projects/hadoop/examples/inputs/points-large'': 
 +<code>M=machines; K=200; INPUT=/net/projects/hadoop/examples/inputs/points-large/points.txt 
 +rm -rf step-31-out/net/projects/hadoop/bin/hadoop KMeans.java.jar -Dclusters.num=$K -Dclusters.file=$INPUT [-jt jobtracker | -c $M] `/net/projects/hadoop/bin/compute-splitsize $INPUT $M` $INPUT step-31-out</code>
  
 Solution: {{:courses:mapreduce-tutorial:step-31-solution3.txt|KMeans.java}}. Solution: {{:courses:mapreduce-tutorial:step-31-solution3.txt|KMeans.java}}.
  

[ Back to the navigation ] [ Back to the content ]