Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
courses:mapreduce-tutorial:step-10 [2012/01/25 19:01] straka |
courses:mapreduce-tutorial:step-10 [2012/01/28 16:44] majlis Commands for execution were added. |
||
---|---|---|---|
Line 3: | Line 3: | ||
Sometimes the reduce is a binary operation, which is associative and commutative, | Sometimes the reduce is a binary operation, which is associative and commutative, | ||
- | Instead, reducer can be executed right after the map, on //some portion// of values belonging to the same key. Only the results are then sent through the network. | + | Instead, reducer can be executed right after the map, on //some portion// of values belonging to the same key. Only the aggregated |
- | A Hadoop job can have such locally executed reducer, called // | + | A Hadoop job can have such locally executed reducer, called |
Typically, the combiner is the same as the reducer of a MR job. | Typically, the combiner is the same as the reducer of a MR job. | ||
- | <code perl> | + | <file perl> |
package Mapper; | package Mapper; | ||
- | use Moose; | + | ... |
- | with ' | + | |
- | + | ||
- | sub map { | + | |
- | my ($self, $key, $value, $context) = @_; | + | |
- | + | ||
- | foreach my $word (split /\W/, $value) { | + | |
- | next if not length $word; | + | |
- | $context-> | + | |
- | } | + | |
- | } | + | |
package Reducer; | package Reducer; | ||
- | use Moose; | + | ... |
- | with ' | + | |
- | + | ||
- | sub reduce { | + | |
- | my ($self, $key, $values, $context) = @_; | + | |
- | + | ||
- | my $sum = 0; | + | |
- | while ($values-> | + | |
- | $sum += $values-> | + | |
- | } | + | |
- | + | ||
- | $context-> | + | |
- | } | + | |
package Main; | package Main; | ||
Line 43: | Line 21: | ||
my $runner = Hadoop:: | my $runner = Hadoop:: | ||
mapper => Mapper-> | mapper => Mapper-> | ||
- | combiner => Reducer-> | + | combiner => Reducer-> |
reducer => Reducer-> | reducer => Reducer-> | ||
input_format => ' | input_format => ' | ||
+ | ... | ||
+ | </ | ||
+ | |||
+ | ===== Exercise ===== | ||
+ | |||
+ | Compare the effect of adding the combiner to a MR job which counts occurrences of words in ''/ | ||
+ | wget --no-check-certificate ' | ||
+ | rm -rf step-10-out-wout; | ||
+ | | ||
+ | wget --no-check-certificate ' | ||
+ | rm -rf step-10-out-with; | ||
+ | |||
+ | How would you explain the results? | ||
- | $runner-> | + | ---- |
- | </ | + | |
+ | < | ||
+ | <table style=" | ||
+ | <tr> | ||
+ | <td style=" | ||
+ | <td style=" | ||
+ | <td style=" | ||
+ | </tr> | ||
+ | </ | ||
+ | </ |