Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:perl-api [2012/01/23 21:18]
straka
+++ courses:mapreduce-tutorial:perl-api [2012/01/25 14:32]
straka
@@ Line 30: / Line 30: @@
   * ''hadoop_prefix'' -- the prefix of Hadoop instalation. Default value is fine in UFAL cluster.
   * ''keep_env'' -- which environment variables are preserved when running perl mappers, reducers, combiners and partitioners
+==== Command line arguments supported by Hadoop::Runner::run() ====
 ===== Hadoop::Mapper =====
@@ Line 39: / Line 41: @@
 requires 'map';
-sub setup() { 1; }
+sub setup() {}
-sub cleanup { 1;}
+sub cleanup {}
 </code>
   * ''sub map($self, $key, $value, $context)'' -- executed for every (key, value) input pair. The variable '$content' has following methods:
@@ Line 47: / Line 49: @@
   * ''sub setup($self, $context)'' -- executed once before any input (key, value) pairs are processed. The ''$context'' can be used to both produce (key, value) pairs and increment counters.
   * ''sub cleanup($self, $context)'' -- executed once after all input (key, value) pairs are processed. The ''$context'' can be used to both produce (key, value) pairs and increment counters.
+===== Hadoop::Reducer =====
+<code perl>
+package Hadoop::Reduce;
+use Moose::Role;
+requires 'reduce';
+sub setup() {}
+sub cleanup {}
+</code>
+  * ''sub reduce($self, $key, $values, $context)'' -- executed for every ''$key''. The ''$values'' is an iterator with the following methods:
+    * ''$values%%->%%value()'' -- returns the current value, undef if there is any.
+    * ''$values%%->%%next()'' -- advance to next value. Returns true if there is any, false otherwise.
+    * At the beginning there is no current value, the first value should be obtained by calling 'next'.
+  * ''sub reduce($self, $key, $values, $context)'' -- the variable ''$content'' has following methods:
+    * ''$content%%->%%write($key, $value)'' -- output the (''$key'', ''$value'') pair
+    * ''$content%%->%%counter($group, $name, $increment)'' -- increases the counter ''$name'' in the group ''$group'' by ''$increment''
+  * ''sub setup($self, $context)'' -- executed once before any input keys are processed. The ''$context'' can be used to both produce (key, value) pairs and increment counters.
+  * ''sub cleanup($self, $context)'' -- executed once after all input keys are processed. The ''$context'' can be used to both produce (key, value) pairs and increment counters.
+===== Hadoop::Partitioner =====
+<code perl>
+package Hadoop::Partitioner;
+use Moose::Role;
+requires 'getPartition';
+sub setup {}
+sub cleanup {}
+</code>
+  * ''sub getPartition($self, $key, $value, $partitions)'' -- executed for every output (key, value) pair. It must return a number of partition in range 0..$partitions-1, where the output (key, value) pair should be placed.
+  * ''sub setup($self)'' -- executed once before any input (key, value) pairs are processed.
+  * ''sub cleanup($self)'' -- executed once after all input (key, value) pairs are processed.

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences