Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
courses:mapreduce-tutorial:step-24 [2012/01/31 11:27] straka |
courses:mapreduce-tutorial:step-24 [2012/01/31 11:48] straka |
* ''net/projects/hadoop/bin/hadoop job.jar -c number_of_machines [-w secs_to_wait_after_job_finishes] [-r number_of_reducers] [-Dname=value -Dname=value ...] input output_path'' -- creates a new cluster with specified number of machines, which executes given job, and then waits for specified number of seconds before it stops. | * ''net/projects/hadoop/bin/hadoop job.jar -c number_of_machines [-w secs_to_wait_after_job_finishes] [-r number_of_reducers] [-Dname=value -Dname=value ...] input output_path'' -- creates a new cluster with specified number of machines, which executes given job, and then waits for specified number of seconds before it stops. |
| |
===== Exercise ===== | ===== Exercise 1 ===== |
Download the ''MapperOnlyHadoopJob.java'', compile it and run it using | Download the ''MapperOnlyHadoopJob.java'', compile it and run it using |
wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-24?codeblock=1' -O 'MapperOnlyHadoopJob.java' | wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_export/code/courses:mapreduce-tutorial:step-24?codeblock=1' -O 'MapperOnlyHadoopJob.java' |
===== Counters ===== | ===== Counters ===== |
| |
As in the Perl API, a mapper or a reducer can increment various counters by using ''context.getCounter("Group", "Name").increment(value)'': | As in the Perl API, a mapper (or a reducer) can increment various counters by using ''context.getCounter("Group", "Name").increment(value)'': |
<code java> | <code java> |
public void map(Text key, Text value, Context context) throws IOException, InterruptedException { | public void map(Text key, Text value, Context context) throws IOException, InterruptedException { |
The ''getCounter'' method returns a [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Counter.html|Counter]] object, so if a counter is incremented frequently, the ''getCounter'' method can be called only once: | The ''getCounter'' method returns a [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Counter.html|Counter]] object, so if a counter is incremented frequently, the ''getCounter'' method can be called only once: |
<code java> | <code java> |
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { | public void map(Text key, Text value, Context context) throws IOException, InterruptedException { |
... | ... |
Counter values = context.getCounter("Reducer", "Number of values"); | Counter words = context.getCounter("Mapper", "Number of words"); |
for (IntWritable value : values) { | for (String word : value.toString().split("\\W+")) { |
... | ... |
values.increment(1); | words.increment(1); |
} | } |
} | } |
</code> | </code> |
| |
| ===== Example 2 ===== |
| |
| Run a Hadoop job on /home/straka/wiki/cs-text-small, which filter the documents so that only three letter words remain. Also use counters to count the histogram of words lengths and to compute the percentage of three letter words in the documents. You can download the template {{:courses:mapreduce-tutorial:step-24.txt|ThreeLetterWords.java}} and execute it. |
| |
| TODO |
| |
---- | ---- |