[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:perl-api [2012/01/23 21:07]
straka vytvořeno
courses:mapreduce-tutorial:perl-api [2012/01/23 21:34]
straka
Line 1: Line 1:
 ====== MapReduce Tutorial - Perl API ====== ====== MapReduce Tutorial - Perl API ======
  
-The main class is ''Hadoop::Runner'': +===== Hadoop::Runner ===== 
-<code>+ 
 +<code perl>
 package Hadoop::Runner; package Hadoop::Runner;
 +use Moose;
  
 has 'mapper' => (does => 'Hadoop::Mapper', required => 1); has 'mapper' => (does => 'Hadoop::Mapper', required => 1);
Line 28: Line 30:
   * ''hadoop_prefix'' -- the prefix of Hadoop instalation. Default value is fine in UFAL cluster.   * ''hadoop_prefix'' -- the prefix of Hadoop instalation. Default value is fine in UFAL cluster.
   * ''keep_env'' -- which environment variables are preserved when running perl mappers, reducers, combiners and partitioners   * ''keep_env'' -- which environment variables are preserved when running perl mappers, reducers, combiners and partitioners
 +
 +===== Hadoop::Mapper =====
 +
 +<code perl>
 +package Hadoop::Mapper;
 +use Moose::Role;
 +
 +requires 'map';
 +
 +sub setup() {}
 +sub cleanup {}
 +</code>
 +  * ''sub map($self, $key, $value, $context)'' -- executed for every (key, value) input pair. The variable '$content' has following methods:
 +    * ''$content%%->%%write($key, $value)'' -- output the (''$key'', ''$value'') pair
 +    * ''$content%%->%%counter($group, $name, $increment)'' -- increases the counter ''$name'' in the group ''$group'' by ''$increment''
 +  * ''sub setup($self, $context)'' -- executed once before any input (key, value) pairs are processed. The ''$context'' can be used to both produce (key, value) pairs and increment counters.
 +  * ''sub cleanup($self, $context)'' -- executed once after all input (key, value) pairs are processed. The ''$context'' can be used to both produce (key, value) pairs and increment counters.
 +
 +===== Hadoop::Reducer =====
 +
 +<code perl>
 +package Hadoop::Reduce;
 +use Moose::Role;
 +
 +requires 'reduce';
 +
 +sub setup() {}
 +sub cleanup {}
 +</code>
 +  * ''sub reduce($self, $key, $values, $context)'' -- executed for every ''$key''. The ''$values'' is an iterator with the following methods:
 +    * ''$values%%->%%value()'' -- returns the current value, undef if there is any.
 +    * ''$values%%->%%next()'' -- advance to next value. Returns true if there is any, false otherwise.
 +    * At the beginning there is no current value, the first value should be obtained by calling 'next'.
 +  * ''sub reduce($self, $key, $values, $context)'' -- the variable ''$content'' has following methods:
 +    * ''$content%%->%%write($key, $value)'' -- output the (''$key'', ''$value'') pair
 +    * ''$content%%->%%counter($group, $name, $increment)'' -- increases the counter ''$name'' in the group ''$group'' by ''$increment''
 +  * ''sub setup($self, $context)'' -- executed once before any input keys are processed. The ''$context'' can be used to both produce (key, value) pairs and increment counters.
 +  * ''sub cleanup($self, $context)'' -- executed once after all input keys are processed. The ''$context'' can be used to both produce (key, value) pairs and increment counters.
 +
 +===== Hadoop::Partitioner =====
 +
 +<code perl>
 +package Hadoop::Partitioner;
 +use Moose::Role;
 +
 +requires 'getPartition';
 +
 +sub setup {}
 +sub cleanup {}
 +
 +</code>
 +  * ''sub getPartition($self, $key, $value, $partitions)'' -- executed for every output (key, value) pair. It must return a number of partition in range 0..$partitions-1, where the output (key, value) pair should be placed.
 +  * ''sub setup($self)'' -- executed once before any input (key, value) pairs are processed.
 +  * ''sub cleanup($self)'' -- executed once after all input (key, value) pairs are processed. 
  

[ Back to the navigation ] [ Back to the content ]