Both sides previous revision
Previous revision
Next revision
|
Previous revision
|
courses:mapreduce-tutorial:step-24 [2012/01/31 11:27] straka |
courses:mapreduce-tutorial:step-24 [2012/01/31 16:25] (current) dusek |
* ''net/projects/hadoop/bin/hadoop job.jar -c number_of_machines [-w secs_to_wait_after_job_finishes] [-r number_of_reducers] [-Dname=value -Dname=value ...] input output_path'' -- creates a new cluster with specified number of machines, which executes given job, and then waits for specified number of seconds before it stops. | * ''net/projects/hadoop/bin/hadoop job.jar -c number_of_machines [-w secs_to_wait_after_job_finishes] [-r number_of_reducers] [-Dname=value -Dname=value ...] input output_path'' -- creates a new cluster with specified number of machines, which executes given job, and then waits for specified number of seconds before it stops. |
| |
===== Exercise ===== | ===== Exercise 1 ===== |
Download the ''MapperOnlyHadoopJob.java'', compile it and run it using | Download the ''MapperOnlyHadoopJob.java'', compile it and run it using |
wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-24?codeblock=1' -O 'MapperOnlyHadoopJob.java' | wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-24?codeblock=1' -O 'MapperOnlyHadoopJob.java' |
===== Counters ===== | ===== Counters ===== |
| |
As in the Perl API, a mapper or a reducer can increment various counters by using ''context.getCounter("Group", "Name").increment(value)'': | As in the Perl API, a mapper (or a reducer) can increment various counters by using ''context.getCounter("Group", "Name").increment(value)'': |
<code java> | <code java> |
public void map(Text key, Text value, Context context) throws IOException, InterruptedException { | public void map(Text key, Text value, Context context) throws IOException, InterruptedException { |
The ''getCounter'' method returns a [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Counter.html|Counter]] object, so if a counter is incremented frequently, the ''getCounter'' method can be called only once: | The ''getCounter'' method returns a [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Counter.html|Counter]] object, so if a counter is incremented frequently, the ''getCounter'' method can be called only once: |
<code java> | <code java> |
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { | public void map(Text key, Text value, Context context) throws IOException, InterruptedException { |
... | ... |
Counter values = context.getCounter("Reducer", "Number of values"); | Counter words = context.getCounter("Mapper", "Number of words"); |
for (IntWritable value : values) { | for (String word : value.toString().split("\\W+")) { |
... | ... |
values.increment(1); | words.increment(1); |
} | } |
} | } |
</code> | </code> |
| |
| ===== Example 2 ===== |
| |
| Run a Hadoop job on /home/straka/wiki/cs-text-small, which filters the documents so that only three-letter words remain. Also use counters to count the histogram of words lengths and to compute the percentage of three letter words in the documents. You can download the template {{:courses:mapreduce-tutorial:step-24.txt|ThreeLetterWords.java}} and execute it. |
| |
| wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-24.txt' -O 'ThreeLetterWords.java' |
| # NOW VIEW THE FILE |
| # $EDITOR ThreeLetterWords.java |
| make -f /net/projects/hadoop/java/Makefile ThreeLetterWords.jar |
| rm -rf step-24-out-sol; /net/projects/hadoop/bin/hadoop ThreeLetterWords.jar -r 0 /home/straka/wiki/cs-text-small step-24-out-sol |
| less step-24-out-sol/part-* |
| |
---- | ---- |