Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
courses:mapreduce-tutorial:step-31 [2012/02/06 08:21] straka |
courses:mapreduce-tutorial:step-31 [2012/02/06 08:40] straka |
| |
===== Example ===== | ===== Example ===== |
This example reads the keys of ''/net/projects/hadoop/examples/inputs/numbers-small/numbers.txt'', computes the sum of all the keys and print it: | This example reads the keys of ''/net/projects/hadoop/examples/inputs/numbers-small'', computes the sum of all the keys and print it: |
<code java Sum.java> | <code java Sum.java> |
import org.apache.hadoop.mapreduce.*; | import org.apache.hadoop.mapreduce.*; |
| |
To run on a cluster with //C// machines using //C// mappers: | To run on a cluster with //C// machines using //C// mappers: |
rm -rf step-31-out; /net/projects/hadoop/bin/hadoop -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` Sum.jar /net/projects/hadoop/examples/inputs/numbers-small step-31-out | rm -rf step-31-out; /net/projects/hadoop/bin/hadoop Sum.jar -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` /net/projects/hadoop/examples/inputs/numbers-small step-31-out |
less step-31-out/part-* | less step-31-out/part-* |
| |
===== Exercise 1 ===== | ===== Exercise 1 ===== |
| |
Implement an AllReduce job on | Implement an AllReduce job on ''/net/projects/hadoop/examples/inputs/numbers-small'', which computes |
| * number of keys |
| * mean of the keys |
| * variance of the keys |
| * minimum of the keys |
| * maximum of the keys |
| You can download the template {{:courses:mapreduce-tutorial:step-31-exercise1.txt|Statistics.java}} and execute it using: |
| wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-31-exercise1.txt' -O Statistics.java |
| # NOW VIEW THE FILE |
| # $EDITOR Statistics.java |
| make -f /net/projects/hadoop/java/Makefile Statistics.java |
| rm -rf step-31-out; /net/projects/hadoop/bin/hadoop Statistics.jar -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` /net/projects/hadoop/examples/inputs/numbers-small step-31-out |
| less step-31-out/part-* |
| |
| ===== Exercise 2 ===== |
| |
| Implement an AllReduce job on ''/net/projects/hadoop/examples/inputs/numbers-small'', which computes median of the input data. You can use the following iterative algorithm: |
| * At the beginning, set //min<sub>1</sub>// = ''Integer.MIN_VALUE'', //max<sub>1</sub>// = ''Integer.MAX_VALUE'', //index_to_find// = number_of_input_data / 2. |
| * In step //i//, do the following: |
| - Consider only input keys in range <//min<sub>i</sub>//, //max<sub>i</sub>//>. |
| - Compute //split// = ceiling of mean of the keys. |
| - If the //index_to_find// is in range <1+number of keys less than //split//, number of keys less or equal to //split//>, then ''split'' is median. |
| - Else, if //index_to_find// is at most the number of keys less than //split//, set //max<sub>i+1</sub>// = //split//-1. |
| - Else, set //min<sub>i+1</sub>// = //split//+1 and subtract from //index_to_find// the number of keys less or equal to //split//. |
| |
| You can download the template {{:courses:mapreduce-tutorial:step-31-exercise2.txt|Median.java}} and execute it using: |
| wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-31-exercise2.txt' -O Median.java |
| # NOW VIEW THE FILE |
| # $EDITOR Median.java |
| make -f /net/projects/hadoop/java/Makefile Median.java |
| rm -rf step-31-out; /net/projects/hadoop/bin/hadoop Median.jar -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` /net/projects/hadoop/examples/inputs/numbers-small step-31-out |
| less step-31-out/part-* |
| |
| Solution: {{:courses:mapreduce-tutorial:step-31-solution2.txt|Median.java}}. |
| |
| ===== Exercise 3 ===== |
| |
| Implement an AllReduce job on ''/net/projects/hadoop/examples/inputs/numbers-small'', which computes |
| |
| You can download the template {{:courses:mapreduce-tutorial:step-31-exercise3.txt|Median.java}} and execute it using: |
| wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-31-exercise3.txt' -O KMeans.java.java |
| # NOW VIEW THE FILE |
| # $EDITOR KMeans.java.java |
| make -f /net/projects/hadoop/java/Makefile KMeans.java.java |
| rm -rf step-31-out; /net/projects/hadoop/bin/hadoop KMeans.java.jar -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` /net/projects/hadoop/examples/inputs/numbers-small step-31-out |
| less step-31-out/part-* |
| |
| Solution: {{:courses:mapreduce-tutorial:step-31-solution3.txt|KMeans.java}}. |