Next revision
|
Previous revision
Next revision
Both sides next revision
|
courses:mapreduce-tutorial:perl-api [2012/01/23 21:07] straka vytvořeno |
courses:mapreduce-tutorial:perl-api [2012/01/23 21:28] straka |
====== MapReduce Tutorial - Perl API ====== | ====== MapReduce Tutorial - Perl API ====== |
| |
The main class is ''Hadoop::Runner'': | ===== Hadoop::Runner ===== |
<code> | |
| <code perl> |
package Hadoop::Runner; | package Hadoop::Runner; |
| use Moose; |
| |
has 'mapper' => (does => 'Hadoop::Mapper', required => 1); | has 'mapper' => (does => 'Hadoop::Mapper', required => 1); |
* ''keep_env'' -- which environment variables are preserved when running perl mappers, reducers, combiners and partitioners | * ''keep_env'' -- which environment variables are preserved when running perl mappers, reducers, combiners and partitioners |
| |
| ===== Hadoop::Mapper ===== |
| |
| <code perl> |
| package Hadoop::Mapper; |
| use Moose::Role; |
| |
| requires 'map'; |
| |
| sub setup() { 1; } |
| sub cleanup { 1;} |
| </code> |
| * ''sub map($self, $key, $value, $context)'' -- executed for every (key, value) input pair. The variable '$content' has following methods: |
| * ''$content%%->%%write($key, $value)'' -- output the (''$key'', ''$value'') pair |
| * ''$content%%->%%counter($group, $name, $increment)'' -- increases the counter ''$name'' in the group ''$group'' by ''$increment'' |
| * ''sub setup($self, $context)'' -- executed once before any input (key, value) pairs are processed. The ''$context'' can be used to both produce (key, value) pairs and increment counters. |
| * ''sub cleanup($self, $context)'' -- executed once after all input (key, value) pairs are processed. The ''$context'' can be used to both produce (key, value) pairs and increment counters. |
| |
| ===== Hadoop::Reducer ===== |
| |
| <code perl> |
| package Hadoop::Reduce; |
| use Moose::Role; |
| |
| requires 'reduce'; |
| |
| sub setup() { 1; } |
| sub cleanup { 1;} |
| </code> |
| * ''sub reduce($self, $key, $values, $context)'' -- executed for every ''$key''. The ''$values'' is an iterator with the following methods: |
| * ''$values%%->%%value()'' -- returns the current value, undef if there is any. |
| * ''$values%%->%%next()'' -- advance to next value. Returns true if there is any, false otherwise. |
| * At the beginning there is no current value, the first value should be obtained by calling 'next'. |
| * ''sub reduce($self, $key, $values, $context)'' -- the variable '$content' has following methods: |
| * ''$content%%->%%write($key, $value)'' -- output the (''$key'', ''$value'') pair |
| * ''$content%%->%%counter($group, $name, $increment)'' -- increases the counter ''$name'' in the group ''$group'' by ''$increment'' |
| * ''sub setup($self, $context)'' -- executed once before any input keys are processed. The ''$context'' can be used to both produce (key, value) pairs and increment counters. |
| * ''sub cleanup($self, $context)'' -- executed once after all input keys are processed. The ''$context'' can be used to both produce (key, value) pairs and increment counters. |