Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
courses:mapreduce-tutorial:perl-api [2012/01/23 21:18] straka |
courses:mapreduce-tutorial:perl-api [2012/01/25 14:32] straka |
* ''hadoop_prefix'' -- the prefix of Hadoop instalation. Default value is fine in UFAL cluster. | * ''hadoop_prefix'' -- the prefix of Hadoop instalation. Default value is fine in UFAL cluster. |
* ''keep_env'' -- which environment variables are preserved when running perl mappers, reducers, combiners and partitioners | * ''keep_env'' -- which environment variables are preserved when running perl mappers, reducers, combiners and partitioners |
| |
| ==== Command line arguments supported by Hadoop::Runner::run() ==== |
| |
===== Hadoop::Mapper ===== | ===== Hadoop::Mapper ===== |
requires 'map'; | requires 'map'; |
| |
sub setup() { 1; } | sub setup() {} |
sub cleanup { 1;} | sub cleanup {} |
</code> | </code> |
* ''sub map($self, $key, $value, $context)'' -- executed for every (key, value) input pair. The variable '$content' has following methods: | * ''sub map($self, $key, $value, $context)'' -- executed for every (key, value) input pair. The variable '$content' has following methods: |
* ''sub setup($self, $context)'' -- executed once before any input (key, value) pairs are processed. The ''$context'' can be used to both produce (key, value) pairs and increment counters. | * ''sub setup($self, $context)'' -- executed once before any input (key, value) pairs are processed. The ''$context'' can be used to both produce (key, value) pairs and increment counters. |
* ''sub cleanup($self, $context)'' -- executed once after all input (key, value) pairs are processed. The ''$context'' can be used to both produce (key, value) pairs and increment counters. | * ''sub cleanup($self, $context)'' -- executed once after all input (key, value) pairs are processed. The ''$context'' can be used to both produce (key, value) pairs and increment counters. |
| |
| ===== Hadoop::Reducer ===== |
| |
| <code perl> |
| package Hadoop::Reduce; |
| use Moose::Role; |
| |
| requires 'reduce'; |
| |
| sub setup() {} |
| sub cleanup {} |
| </code> |
| * ''sub reduce($self, $key, $values, $context)'' -- executed for every ''$key''. The ''$values'' is an iterator with the following methods: |
| * ''$values%%->%%value()'' -- returns the current value, undef if there is any. |
| * ''$values%%->%%next()'' -- advance to next value. Returns true if there is any, false otherwise. |
| * At the beginning there is no current value, the first value should be obtained by calling 'next'. |
| * ''sub reduce($self, $key, $values, $context)'' -- the variable ''$content'' has following methods: |
| * ''$content%%->%%write($key, $value)'' -- output the (''$key'', ''$value'') pair |
| * ''$content%%->%%counter($group, $name, $increment)'' -- increases the counter ''$name'' in the group ''$group'' by ''$increment'' |
| * ''sub setup($self, $context)'' -- executed once before any input keys are processed. The ''$context'' can be used to both produce (key, value) pairs and increment counters. |
| * ''sub cleanup($self, $context)'' -- executed once after all input keys are processed. The ''$context'' can be used to both produce (key, value) pairs and increment counters. |
| |
| ===== Hadoop::Partitioner ===== |
| |
| <code perl> |
| package Hadoop::Partitioner; |
| use Moose::Role; |
| |
| requires 'getPartition'; |
| |
| sub setup {} |
| sub cleanup {} |
| |
| </code> |
| * ''sub getPartition($self, $key, $value, $partitions)'' -- executed for every output (key, value) pair. It must return a number of partition in range 0..$partitions-1, where the output (key, value) pair should be placed. |
| * ''sub setup($self)'' -- executed once before any input (key, value) pairs are processed. |
| * ''sub cleanup($self)'' -- executed once after all input (key, value) pairs are processed. |
| |