Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
|
courses:mapreduce-tutorial:step-5 [2012/01/24 19:04] straka vytvořeno |
courses:mapreduce-tutorial:step-5 [2012/01/31 15:56] (current) straka |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== MapReduce Tutorial : ====== | + | ====== MapReduce Tutorial : Basic reducer |
| + | |||
| + | The interesting part of a Hadoop job is the //reducer// -- after all mappers produce the (key, value) pairs, for every unique key and all its values a '' | ||
| + | |||
| + | The '' | ||
| + | |||
| + | <file perl> | ||
| + | package My:: | ||
| + | use Moose; | ||
| + | with ' | ||
| + | |||
| + | sub map { | ||
| + | my ($self, $key, $value, $context) = @_; | ||
| + | |||
| + | $context-> | ||
| + | } | ||
| + | |||
| + | package My:: | ||
| + | use Moose; | ||
| + | with ' | ||
| + | |||
| + | sub reduce { | ||
| + | my ($self, $key, $values, $context) = @_; | ||
| + | |||
| + | while ($values-> | ||
| + | $context-> | ||
| + | } | ||
| + | } | ||
| + | |||
| + | package main; | ||
| + | use Hadoop:: | ||
| + | |||
| + | my $runner = Hadoop:: | ||
| + | mapper => My:: | ||
| + | reducer => My:: | ||
| + | |||
| + | $runner-> | ||
| + | </ | ||
| + | |||
| + | As before, Hadoop silently handles failures. It can happen that even a successfully finished mapper needs to be executed again -- if the machine, where its output data were stored, gets disconnected from the network. | ||
| + | |||
| + | ===== Types of keys and values ===== | ||
| + | |||
| + | Currently in the Perl API, the keys and values are both strings, which are stored and loaded using UTF-8 format and compared lexicographically. If you need more complex structures, you have to serialize and deserialize them by yourselves. | ||
| + | |||
| + | The Java API offers a wide range of types, including user-defined types, to be used for keys and values. | ||
| + | |||
| + | ===== Exercise 1 ===== | ||
| + | |||
| + | Run a Hadoop job on ''/ | ||
| + | wget --no-check-certificate ' | ||
| + | # NOW EDIT THE FILE | ||
| + | # $EDITOR step-5-exercise1.pl | ||
| + | rm -rf step-5-out-ex1; | ||
| + | less step-5-out-ex1/ | ||
| + | |||
| + | ==== Solution ==== | ||
| + | You can also download the solution {{: | ||
| + | wget --no-check-certificate ' | ||
| + | # NOW VIEW THE FILE | ||
| + | # $EDITOR step-5-solution1.pl | ||
| + | rm -rf step-5-out-sol1; | ||
| + | less step-5-out-sol1/ | ||
| + | |||
| + | |||
| + | ===== Exercise 2 ===== | ||
| + | |||
| + | Run a Hadoop job on ''/ | ||
| + | wget --no-check-certificate ' | ||
| + | # NOW EDIT THE FILE | ||
| + | # $EDITOR step-5-exercise2.pl | ||
| + | rm -rf step-5-out-ex2; | ||
| + | less step-5-out-ex2/ | ||
| + | |||
| + | ==== Solution ==== | ||
| + | You can also download the solution {{: | ||
| + | wget --no-check-certificate ' | ||
| + | # NOW VIEW THE FILE | ||
| + | # $EDITOR step-5-solution2.pl | ||
| + | rm -rf step-5-out-sol2; | ||
| + | less step-5-out-sol2/ | ||
| + | |||
| + | ---- | ||
| + | |||
| + | < | ||
| + | <table style=" | ||
| + | < | ||
| + | <td style=" | ||
| + | <td style=" | ||
| + | <td style=" | ||
| + | </ | ||
| + | </ | ||
| + | </ | ||
