Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
courses:mapreduce-tutorial:step-10 [2012/01/25 19:01] straka |
courses:mapreduce-tutorial:step-10 [2012/01/31 09:38] (current) straka Change Perl commandline syntax. |
||
|---|---|---|---|
| Line 3: | Line 3: | ||
| Sometimes the reduce is a binary operation, which is associative and commutative, | Sometimes the reduce is a binary operation, which is associative and commutative, | ||
| - | Instead, reducer can be executed right after the map, on //some portion// of values belonging to the same key. Only the results are then sent through the network. | + | Instead, reducer can be executed right after the map, on //some portion// of values belonging to the same key. Only the aggregated |
| - | A Hadoop job can have such locally executed reducer, called // | + | A Hadoop job can have such locally executed reducer, called |
| Typically, the combiner is the same as the reducer of a MR job. | Typically, the combiner is the same as the reducer of a MR job. | ||
| - | <code perl> | + | <file perl> |
| - | package | + | package |
| - | use Moose; | + | ... |
| - | with ' | + | |
| - | sub map { | + | package My::Reducer; |
| - | my ($self, $key, $value, $context) = @_; | + | ... |
| - | foreach my $word (split /\W/, $value) { | + | package main; |
| - | next if not length $word; | + | use Hadoop:: |
| - | | + | |
| - | } | + | |
| - | } | + | |
| - | package Reducer; | + | my $runner = Hadoop:: |
| - | use Moose; | + | |
| - | with ' | + | |
| + | reducer => My:: | ||
| + | input_format => 'KeyValueTextInputFormat' | ||
| + | ... | ||
| + | </ | ||
| - | sub reduce { | + | ===== Exercise ===== |
| - | my ($self, $key, $values, $context) | + | |
| - | my $sum = 0; | + | Compare the effect of adding the combiner to a MR job which counts occurrences of words in ''/ |
| - | | + | wget --no-check-certificate ' |
| - | $sum += $values->value; | + | # NOW VIEW THE FILE |
| - | | + | # $EDITOR step-10-wc-without-combiner.pl |
| + | rm -rf step-10-out-wout; time perl step-10-wc-without-combiner.pl / | ||
| + | | ||
| + | wget --no-check-certificate ' | ||
| + | # NOW VIEW THE FILE | ||
| + | # $EDITOR step-10-wc-with-combiner.pl | ||
| + | | ||
| - | $context-> | + | How would you explain the results? |
| - | } | + | |
| - | + | ||
| - | package Main; | + | |
| - | use Hadoop:: | + | |
| - | + | ||
| - | my $runner = Hadoop:: | + | |
| - | mapper => Mapper-> | + | |
| - | combiner => Reducer-> | + | |
| - | reducer => Reducer-> | + | |
| - | input_format => ' | + | |
| - | $runner-> | + | ---- |
| - | </ | + | |
| + | < | ||
| + | <table style=" | ||
| + | <tr> | ||
| + | <td style=" | ||
| + | <td style=" | ||
| + | <td style=" | ||
| + | </tr> | ||
| + | </ | ||
| + | </ | ||
