[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-8 [2012/01/29 21:04]
straka
courses:mapreduce-tutorial:step-8 [2012/01/30 00:38]
straka Improving package names of Perl programs.
Line 21: Line 21:
 By default, (key, value) pair is sent to a reducer number //hash(key) modulo number_of_reducers//. This guarantees that for one key, all its values are processed by a unique reducer. By default, (key, value) pair is sent to a reducer number //hash(key) modulo number_of_reducers//. This guarantees that for one key, all its values are processed by a unique reducer.
  
-To override the default behaviour, MR job can specify a //partitioner//. A partitioner is given every (key, value) pair produced by a mapper, it is also given the number of reducers, and outputs the zero-based number of reducer, where this (key, value) pair belongs:+To override the default behaviour, MR job can specify a //partitioner//. A partitioner is given every (key, value) pair produced by a mapper, it is also given the number of reducers, and outputs the zero-based number of reducer, where this (key, value) pair belongs
 + 
 +A partitioner should be provided if 
 +  * the default partitioner fails to distribute the data between reducers equally, i.e., some of the reducers operate on much more data than others. 
 +  * you need an explicit control of (key, value) placement. This can happen for example when [[.:step-13|sorting data]].
  
 <code perl> <code perl>
-package Partitioner;+package My::Partitioner;
 use Moose; use Moose;
 with 'Hadoop::Partitioner'; with 'Hadoop::Partitioner';
Line 35: Line 39:
  
 ... ...
-package Main;+package main;
 use Hadoop::Runner; use Hadoop::Runner;
  
 my $runner = Hadoop::Runner->new( my $runner = Hadoop::Runner->new(
   ...   ...
-  partitioner => Partitioner->new(),+  partitioner => My::Partitioner->new(),
   ...);   ...);
 ... ...
Line 56: Line 60:
 Run one MR job on '/home/straka/wiki/cs-text-medium', which creates two output files -- one with ascending list of unique article names and the other with an ascending list of unique words. You can download the template {{:courses:mapreduce-tutorial:step-8-exercise.txt|step-8-exercise.pl}}  and execute it. Run one MR job on '/home/straka/wiki/cs-text-medium', which creates two output files -- one with ascending list of unique article names and the other with an ascending list of unique words. You can download the template {{:courses:mapreduce-tutorial:step-8-exercise.txt|step-8-exercise.pl}}  and execute it.
   wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-8-exercise.txt' -O 'step-8-exercise.pl'   wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-8-exercise.txt' -O 'step-8-exercise.pl'
 +  # NOW EDIT THE FILE
 +  # $EDITOR step-8-exercise.pl
   rm -rf step-8-out-ex; perl step-8-exercise.pl run /home/straka/wiki/cs-text-medium/ step-8-out-ex   rm -rf step-8-out-ex; perl step-8-exercise.pl run /home/straka/wiki/cs-text-medium/ step-8-out-ex
   less step-8-out-ex/part-*   less step-8-out-ex/part-*
Line 62: Line 68:
 You can also download the solution {{:courses:mapreduce-tutorial:step-8-solution.txt|step-8-solution.pl}} and check the correct output. You can also download the solution {{:courses:mapreduce-tutorial:step-8-solution.txt|step-8-solution.pl}} and check the correct output.
   wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-8-solution.txt' -O 'step-8-solution.pl'   wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-8-solution.txt' -O 'step-8-solution.pl'
 +  # NOW VIEW THE FILE
 +  # $EDITOR step-8-solution.pl
   rm -rf step-8-out-sol; perl step-8-solution.pl run /home/straka/wiki/cs-text-medium/ step-8-out-sol   rm -rf step-8-out-sol; perl step-8-solution.pl run /home/straka/wiki/cs-text-medium/ step-8-out-sol
   less step-8-out-sol/part-*   less step-8-out-sol/part-*

[ Back to the navigation ] [ Back to the content ]