Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
courses:mapreduce-tutorial:step-8 [2012/01/29 21:04] straka |
courses:mapreduce-tutorial:step-8 [2012/01/31 09:36] straka The number of reducers must be specified in the exercise. |
By default, (key, value) pair is sent to a reducer number //hash(key) modulo number_of_reducers//. This guarantees that for one key, all its values are processed by a unique reducer. | By default, (key, value) pair is sent to a reducer number //hash(key) modulo number_of_reducers//. This guarantees that for one key, all its values are processed by a unique reducer. |
| |
To override the default behaviour, MR job can specify a //partitioner//. A partitioner is given every (key, value) pair produced by a mapper, it is also given the number of reducers, and outputs the zero-based number of reducer, where this (key, value) pair belongs: | To override the default behaviour, MR job can specify a //partitioner//. A partitioner is given every (key, value) pair produced by a mapper, it is also given the number of reducers, and outputs the zero-based number of reducer, where this (key, value) pair belongs. |
| |
| A partitioner should be provided if |
| * the default partitioner fails to distribute the data between reducers equally, i.e., some of the reducers operate on much more data than others. |
| * you need an explicit control of (key, value) placement. This can happen for example when [[.:step-13|sorting data]]. |
| |
<code perl> | <code perl> |
package Partitioner; | package My::Partitioner; |
use Moose; | use Moose; |
with 'Hadoop::Partitioner'; | with 'Hadoop::Partitioner'; |
| |
... | ... |
package Main; | package main; |
use Hadoop::Runner; | use Hadoop::Runner; |
| |
my $runner = Hadoop::Runner->new( | my $runner = Hadoop::Runner->new( |
... | ... |
partitioner => Partitioner->new(), | partitioner => My::Partitioner->new(), |
...); | ...); |
... | ... |
Run one MR job on '/home/straka/wiki/cs-text-medium', which creates two output files -- one with ascending list of unique article names and the other with an ascending list of unique words. You can download the template {{:courses:mapreduce-tutorial:step-8-exercise.txt|step-8-exercise.pl}} and execute it. | Run one MR job on '/home/straka/wiki/cs-text-medium', which creates two output files -- one with ascending list of unique article names and the other with an ascending list of unique words. You can download the template {{:courses:mapreduce-tutorial:step-8-exercise.txt|step-8-exercise.pl}} and execute it. |
wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-8-exercise.txt' -O 'step-8-exercise.pl' | wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-8-exercise.txt' -O 'step-8-exercise.pl' |
rm -rf step-8-out-ex; perl step-8-exercise.pl run /home/straka/wiki/cs-text-medium/ step-8-out-ex | # NOW EDIT THE FILE |
| # $EDITOR step-8-exercise.pl |
| rm -rf step-8-out-ex; perl step-8-exercise.pl -c 2 -r 2 /home/straka/wiki/cs-text-medium/ step-8-out-ex |
less step-8-out-ex/part-* | less step-8-out-ex/part-* |
| |
You can also download the solution {{:courses:mapreduce-tutorial:step-8-solution.txt|step-8-solution.pl}} and check the correct output. | You can also download the solution {{:courses:mapreduce-tutorial:step-8-solution.txt|step-8-solution.pl}} and check the correct output. |
wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-8-solution.txt' -O 'step-8-solution.pl' | wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-8-solution.txt' -O 'step-8-solution.pl' |
rm -rf step-8-out-sol; perl step-8-solution.pl run /home/straka/wiki/cs-text-medium/ step-8-out-sol | # NOW VIEW THE FILE |
| # $EDITOR step-8-solution.pl |
| rm -rf step-8-out-sol; perl step-8-solution.pl -c 2 -r 2 /home/straka/wiki/cs-text-medium/ step-8-out-sol |
less step-8-out-sol/part-* | less step-8-out-sol/part-* |
| |