Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
courses:mapreduce-tutorial:step-5 [2012/01/24 22:36] straka |
courses:mapreduce-tutorial:step-5 [2012/01/31 15:56] (current) straka |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== MapReduce Tutorial : Basic reducer ====== | ====== MapReduce Tutorial : Basic reducer ====== | ||
- | The interesting part of a MR job is the reducer -- after all mappers produce the (key, value) pairs, for every unique key and all its values a '' | + | The interesting part of a Hadoop |
- | The '' | + | The '' |
- | <file perl reducer.pl> | + | <file perl> |
- | package Mapper; | + | package |
use Moose; | use Moose; | ||
with ' | with ' | ||
Line 16: | Line 16: | ||
} | } | ||
- | package Reducer; | + | package |
use Moose; | use Moose; | ||
with ' | with ' | ||
Line 28: | Line 28: | ||
} | } | ||
- | package | + | package |
use Hadoop:: | use Hadoop:: | ||
my $runner = Hadoop:: | my $runner = Hadoop:: | ||
- | mapper => Mapper-> | + | mapper => My::Mapper-> |
- | reducer => Reducer-> | + | reducer => My::Reducer-> |
$runner-> | $runner-> | ||
</ | </ | ||
+ | |||
+ | As before, Hadoop silently handles failures. It can happen that even a successfully finished mapper needs to be executed again -- if the machine, where its output data were stored, gets disconnected from the network. | ||
+ | |||
+ | ===== Types of keys and values ===== | ||
+ | |||
+ | Currently in the Perl API, the keys and values are both strings, which are stored and loaded using UTF-8 format and compared lexicographically. If you need more complex structures, you have to serialize and deserialize them by yourselves. | ||
+ | |||
+ | The Java API offers a wide range of types, including user-defined types, to be used for keys and values. | ||
===== Exercise 1 ===== | ===== Exercise 1 ===== | ||
- | Run a MR job on / | + | Run a Hadoop |
+ | wget --no-check-certificate ' | ||
+ | # NOW EDIT THE FILE | ||
+ | # $EDITOR step-5-exercise1.pl | ||
+ | rm -rf step-5-out-ex1; | ||
+ | less step-5-out-ex1/ | ||
+ | |||
+ | ==== Solution ==== | ||
+ | You can also download the solution {{: | ||
+ | wget --no-check-certificate ' | ||
+ | # NOW VIEW THE FILE | ||
+ | # $EDITOR step-5-solution1.pl | ||
+ | rm -rf step-5-out-sol1; | ||
+ | less step-5-out-sol1/ | ||
- | {{: | ||
===== Exercise 2 ===== | ===== Exercise 2 ===== | ||
- | Run a MR job on / | + | Run a Hadoop |
+ | wget --no-check-certificate ' | ||
+ | # NOW EDIT THE FILE | ||
+ | # $EDITOR step-5-exercise2.pl | ||
+ | rm -rf step-5-out-ex2; | ||
+ | less step-5-out-ex2/ | ||
+ | |||
+ | ==== Solution ==== | ||
+ | You can also download the solution {{: | ||
+ | wget --no-check-certificate ' | ||
+ | # NOW VIEW THE FILE | ||
+ | # $EDITOR step-5-solution2.pl | ||
+ | rm -rf step-5-out-sol2; | ||
+ | less step-5-out-sol2/ | ||
+ | |||
+ | ---- | ||
+ | < | ||
+ | <table style=" | ||
+ | <tr> | ||
+ | <td style=" | ||
+ | <td style=" | ||
+ | <td style=" | ||
+ | </tr> | ||
+ | </ | ||
+ | </ |