Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-31 [2012/02/06 08:26]
straka
+++ courses:mapreduce-tutorial:step-31 [2012/02/06 08:50]
straka
@@ Line 107: / Line 107: @@
 To run on a cluster with //C// machines using //C// mappers:
-  rm -rf step-31-out; /net/projects/hadoop/bin/hadoop -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` Sum.jar /net/projects/hadoop/examples/inputs/numbers-small step-31-out
+  rm -rf step-31-out; /net/projects/hadoop/bin/hadoop Sum.jar -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` /net/projects/hadoop/examples/inputs/numbers-small step-31-out
   less step-31-out/part-*
@@ Line 123: / Line 123: @@
   # $EDITOR Statistics.java
   make -f /net/projects/hadoop/java/Makefile Statistics.java
-  rm -rf step-31-out; /net/projects/hadoop/bin/hadoop Statistics.jar
+  rm -rf step-31-out; /net/projects/hadoop/bin/hadoop Statistics.jar -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` /net/projects/hadoop/examples/inputs/numbers-small step-31-out
   less step-31-out/part-*
+===== Exercise 2 =====
+Implement an AllReduce job on ''/net/projects/hadoop/examples/inputs/numbers-small'', which computes median of the input data. You can use the following iterative algorithm:
+  * At the beginning, set //min<sub>1</sub>// = ''Integer.MIN_VALUE'', //max<sub>1</sub>// = ''Integer.MAX_VALUE'', //index_to_find// = number_of_input_data / 2.
+  * In step //i//, do the following:
+    - Consider only input keys in range <//min<sub>i</sub>//, //max<sub>i</sub>//>.
+    - Compute //split// = ceiling of mean of the keys.
+    - If the //index_to_find// is in range <1+number of keys less than //split//, number of keys less or equal to //split//>, then ''split'' is median.
+    - Else, if //index_to_find// is at most the number of keys less than //split//, set //max<sub>i+1</sub>// = //split//-1.
+    - Else, set //min<sub>i+1</sub>// = //split//+1 and subtract from //index_to_find// the number of keys less or equal to //split//.
+You can download the template {{:courses:mapreduce-tutorial:step-31-exercise2.txt|Median.java}} and execute it using:
+  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-31-exercise2.txt' -O Median.java
+  # NOW VIEW THE FILE
+  # $EDITOR Median.java
+  make -f /net/projects/hadoop/java/Makefile Median.java
+  rm -rf step-31-out; /net/projects/hadoop/bin/hadoop Median.jar -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` /net/projects/hadoop/examples/inputs/numbers-small step-31-out
+  less step-31-out/part-*
+Solution: {{:courses:mapreduce-tutorial:step-31-solution2.txt|Median.java}}.
+===== Exercise 3 =====
+Implement an AllReduce job on ''/net/projects/hadoop/examples/inputs/points-small'', which implements the [[http://en.wikipedia.org/wiki/K-means_clustering#Standard_algorithm|K-means clustering algorithm]]. See [[.:step-15|K-means clustering exercise]] for description of input data.
+You can download the template {{:courses:mapreduce-tutorial:step-31-exercise3.txt|KMeans.java}}. This template uses two Hadoop properties:
+  * ''clusters.num'' -- number of clusters
+  * ''clusters.file'' -- file where to read the initial clusters from
+You can download and compile it using:
+  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-31-exercise3.txt' -O KMeans.java.java
+  # NOW VIEW THE FILE
+  # $EDITOR KMeans.java.java
+  make -f /net/projects/hadoop/java/Makefile KMeans.java.java
+You can run it using //C// machines on the following input data:
+  * ''/net/projects/hadoop/examples/inputs/points-small'': <code>rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.java.jar -Dclusters.num=50 -Dclusters.file=/net/projects/hadoop/examples/inputs/points-small/points.txt -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/points-small C` /net/projects/hadoop/examples/inputs/points-small step-31-out</code>
+  * ''/net/projects/hadoop/examples/inputs/points-medium'': <code>rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.java.jar -Dclusters.num=100 -Dclusters.file=/net/projects/hadoop/examples/inputs/points-medium/points.txt -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/points-medium C` /net/projects/hadoop/examples/inputs/points-medium step-31-out</code>
+  * ''/net/projects/hadoop/examples/inputs/points-large'': <code>rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.java.jar -Dclusters.num=200 -Dclusters.file=/net/projects/hadoop/examples/inputs/points-large/points.txt -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/points-large C` /net/projects/hadoop/examples/inputs/points-large step-31-out</code>
+Solution: {{:courses:mapreduce-tutorial:step-31-solution3.txt|KMeans.java}}.

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences