Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
courses:mapreduce-tutorial:step-8 [2012/01/29 21:04] straka |
courses:mapreduce-tutorial:step-8 [2012/01/29 23:54] majlis |
By default, (key, value) pair is sent to a reducer number //hash(key) modulo number_of_reducers//. This guarantees that for one key, all its values are processed by a unique reducer. | By default, (key, value) pair is sent to a reducer number //hash(key) modulo number_of_reducers//. This guarantees that for one key, all its values are processed by a unique reducer. |
| |
To override the default behaviour, MR job can specify a //partitioner//. A partitioner is given every (key, value) pair produced by a mapper, it is also given the number of reducers, and outputs the zero-based number of reducer, where this (key, value) pair belongs: | To override the default behaviour, MR job can specify a //partitioner//. A partitioner is given every (key, value) pair produced by a mapper, it is also given the number of reducers, and outputs the zero-based number of reducer, where this (key, value) pair belongs. |
| |
| A partitioner should be provided if |
| * the default partitioner fails to distribute the data between reducers equally, i.e., some of the reducers operate on much more data than others. |
| * you need an explicit control of (key, value) placement. This can happen for example when [[.:step-13|sorting data]]. |
| |
<code perl> | <code perl> |
Run one MR job on '/home/straka/wiki/cs-text-medium', which creates two output files -- one with ascending list of unique article names and the other with an ascending list of unique words. You can download the template {{:courses:mapreduce-tutorial:step-8-exercise.txt|step-8-exercise.pl}} and execute it. | Run one MR job on '/home/straka/wiki/cs-text-medium', which creates two output files -- one with ascending list of unique article names and the other with an ascending list of unique words. You can download the template {{:courses:mapreduce-tutorial:step-8-exercise.txt|step-8-exercise.pl}} and execute it. |
wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-8-exercise.txt' -O 'step-8-exercise.pl' | wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-8-exercise.txt' -O 'step-8-exercise.pl' |
| # NOW EDIT THE FILE |
| # $EDITOR step-8-exercise.pl |
rm -rf step-8-out-ex; perl step-8-exercise.pl run /home/straka/wiki/cs-text-medium/ step-8-out-ex | rm -rf step-8-out-ex; perl step-8-exercise.pl run /home/straka/wiki/cs-text-medium/ step-8-out-ex |
less step-8-out-ex/part-* | less step-8-out-ex/part-* |
You can also download the solution {{:courses:mapreduce-tutorial:step-8-solution.txt|step-8-solution.pl}} and check the correct output. | You can also download the solution {{:courses:mapreduce-tutorial:step-8-solution.txt|step-8-solution.pl}} and check the correct output. |
wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-8-solution.txt' -O 'step-8-solution.pl' | wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-8-solution.txt' -O 'step-8-solution.pl' |
| # NOW VIEW THE FILE |
| # $EDITOR step-8-solution.pl |
rm -rf step-8-out-sol; perl step-8-solution.pl run /home/straka/wiki/cs-text-medium/ step-8-out-sol | rm -rf step-8-out-sol; perl step-8-solution.pl run /home/straka/wiki/cs-text-medium/ step-8-out-sol |
less step-8-out-sol/part-* | less step-8-out-sol/part-* |