[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-31 [2012/02/06 08:18]
straka
courses:mapreduce-tutorial:step-31 [2012/02/06 08:26]
straka
Line 31: Line 31:
  
 It is crucial that all the mappers run simultaneously. This can be achieved using the ''/net/projects/hadoop/bin/compute-splitsize'' script: for given Hadoop input and requested number of mappers, it computes the appropriate splitsize. It is crucial that all the mappers run simultaneously. This can be achieved using the ''/net/projects/hadoop/bin/compute-splitsize'' script: for given Hadoop input and requested number of mappers, it computes the appropriate splitsize.
 +
 +When the computation finishes, only one of the mappers should print the results, as all of them have the same results. For simplicity, the ''cooperate'' method has ''boolean shouldWrite'' argument, which is set in exactly one mapper.
  
 ===== Example ===== ===== Example =====
-This example reads the keys of ''/net/projects/hadoop/examples/inputs/numbers-small/numbers.txt'', computes the sum of all the keys and print it:+This example reads the keys of ''/net/projects/hadoop/examples/inputs/numbers-small'', computes the sum of all the keys and print it:
 <code java Sum.java> <code java Sum.java>
 import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.*;
Line 106: Line 108:
 To run on a cluster with //C// machines using //C// mappers: To run on a cluster with //C// machines using //C// mappers:
   rm -rf step-31-out; /net/projects/hadoop/bin/hadoop -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` Sum.jar /net/projects/hadoop/examples/inputs/numbers-small step-31-out   rm -rf step-31-out; /net/projects/hadoop/bin/hadoop -c C `/net/projects/hadoop/bin/compute-splitsize /net/projects/hadoop/examples/inputs/numbers-small C` Sum.jar /net/projects/hadoop/examples/inputs/numbers-small step-31-out
 +  less step-31-out/part-*
 +
 +===== Exercise 1 =====
 +
 +Implement an AllReduce job on ''/net/projects/hadoop/examples/inputs/numbers-small'', which computes
 +  * number of keys
 +  * mean of the keys
 +  * variance of the keys
 +  * minimum of the keys
 +  * maximum of the keys
 +You can download the template {{:courses:mapreduce-tutorial:step-31-exercise1.txt|Statistics.java}} and execute it using:
 +  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-31-exercise1.txt' -O Statistics.java
 +  # NOW VIEW THE FILE
 +  # $EDITOR Statistics.java
 +  make -f /net/projects/hadoop/java/Makefile Statistics.java
 +  rm -rf step-31-out; /net/projects/hadoop/bin/hadoop Statistics.jar
   less step-31-out/part-*   less step-31-out/part-*
  

[ Back to the navigation ] [ Back to the content ]