Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
courses:mapreduce-tutorial:step-10 [2012/01/25 18:37] straka |
courses:mapreduce-tutorial:step-10 [2012/01/31 09:38] (current) straka Change Perl commandline syntax. |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== MapReduce Tutorial : Combiners ====== | ====== MapReduce Tutorial : Combiners ====== | ||
| + | Sometimes the reduce is a binary operation, which is associative and commutative, | ||
| + | Instead, reducer can be executed right after the map, on //some portion// of values belonging to the same key. Only the aggregated results are then sent through the network. | ||
| + | |||
| + | A Hadoop job can have such locally executed reducer, called a // | ||
| + | |||
| + | Typically, the combiner is the same as the reducer of a MR job. | ||
| + | |||
| + | <file perl> | ||
| + | package My::Mapper; | ||
| + | ... | ||
| + | |||
| + | package My:: | ||
| + | ... | ||
| + | |||
| + | package main; | ||
| + | use Hadoop:: | ||
| + | |||
| + | my $runner = Hadoop:: | ||
| + | mapper => My:: | ||
| + | combiner => My:: | ||
| + | reducer => My:: | ||
| + | input_format => ' | ||
| + | ... | ||
| + | </ | ||
| + | |||
| + | ===== Exercise ===== | ||
| + | |||
| + | Compare the effect of adding the combiner to a MR job which counts occurrences of words in ''/ | ||
| + | wget --no-check-certificate ' | ||
| + | # NOW VIEW THE FILE | ||
| + | # $EDITOR step-10-wc-without-combiner.pl | ||
| + | rm -rf step-10-out-wout; | ||
| + | | ||
| + | wget --no-check-certificate ' | ||
| + | # NOW VIEW THE FILE | ||
| + | # $EDITOR step-10-wc-with-combiner.pl | ||
| + | rm -rf step-10-out-with; | ||
| + | |||
| + | How would you explain the results? | ||
| + | |||
| + | ---- | ||
| + | |||
| + | < | ||
| + | <table style=" | ||
| + | <tr> | ||
| + | <td style=" | ||
| + | <td style=" | ||
| + | <td style=" | ||
| + | </tr> | ||
| + | </ | ||
| + | </ | ||
