Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
courses:mapreduce-tutorial:perl-api [2012/01/23 21:33] straka |
courses:mapreduce-tutorial:perl-api [2012/01/25 14:32] straka |
* ''hadoop_prefix'' -- the prefix of Hadoop instalation. Default value is fine in UFAL cluster. | * ''hadoop_prefix'' -- the prefix of Hadoop instalation. Default value is fine in UFAL cluster. |
* ''keep_env'' -- which environment variables are preserved when running perl mappers, reducers, combiners and partitioners | * ''keep_env'' -- which environment variables are preserved when running perl mappers, reducers, combiners and partitioners |
| |
| ==== Command line arguments supported by Hadoop::Runner::run() ==== |
| |
===== Hadoop::Mapper ===== | ===== Hadoop::Mapper ===== |
requires 'map'; | requires 'map'; |
| |
sub setup() { } | sub setup() {} |
sub cleanup { } | sub cleanup {} |
</code> | </code> |
* ''sub map($self, $key, $value, $context)'' -- executed for every (key, value) input pair. The variable '$content' has following methods: | * ''sub map($self, $key, $value, $context)'' -- executed for every (key, value) input pair. The variable '$content' has following methods: |
requires 'reduce'; | requires 'reduce'; |
| |
sub setup() { } | sub setup() {} |
sub cleanup { } | sub cleanup {} |
</code> | </code> |
* ''sub reduce($self, $key, $values, $context)'' -- executed for every ''$key''. The ''$values'' is an iterator with the following methods: | * ''sub reduce($self, $key, $values, $context)'' -- executed for every ''$key''. The ''$values'' is an iterator with the following methods: |
requires 'getPartition'; | requires 'getPartition'; |
| |
sub setup { } | sub setup {} |
sub cleanup { } | sub cleanup {} |
| |
</code> | </code> |
* ''sub getPartition($self, $key, $value, $partitions)'' -- executed for every (key, value) input pair. It must return a number in the range 0..$partitions-1, | * ''sub getPartition($self, $key, $value, $partitions)'' -- executed for every output (key, value) pair. It must return a number of partition in range 0..$partitions-1, where the output (key, value) pair should be placed. |
* ''sub setup($self)'' -- executed once before any input (key, value) pairs are processed. | * ''sub setup($self)'' -- executed once before any input (key, value) pairs are processed. |
* ''sub cleanup($self)'' -- executed once after all input (key, value) pairs are processed. | * ''sub cleanup($self)'' -- executed once after all input (key, value) pairs are processed. |
| |