MapReduce Tutorial - Perl API

Hadoop::Runner

package Hadoop::Runner;
use Moose;
 
has 'mapper' => (does => 'Hadoop::Mapper', required => 1);
has 'reducer' => (does => 'Hadoop::Reducer');
has 'combiner' => (does => 'Hadoop::Reducer');
has 'partitioner' => (does => 'Hadoop::Partitioner');
 
has 'input_format' => (isa => 'InputFormat', default => 'TextInputFormat');
has 'output_format' => (isa => 'OutputFormat', default => 'TextOutputFormat');
has 'output_compression' => (isa => 'Bool', default => 0);
 
has 'hadoop_prefix' => (isa => 'Str', default => '/SGE/HADOOP/active');
has 'keep_env' => (isa => 'ArrayRef[Str]', default => sub { ["PERLLIB", "PERL5LIB"] });
 
sub run();

mapper – a Hadoop::Mapper to use
reducer – an optional Hadoop::Reducer to use
combiner – an optional Hadoop::Reducer to use as combiner
partitioner – an optional Hadoop::Partitioner to use
input_format – one of TextInputFormat, KeyValueTextInputFormat, SequenceFileInputFormat
output_format – one of TextOutputFormat, SequenceFileOutputFormat
output_compression – Bool flag controlling the compression of output
hadoop_prefix – the prefix of Hadoop instalation. Default value is fine in UFAL cluster.
keep_env – which environment variables are preserved when running perl mappers, reducers, combiners and partitioners

Hadoop::Mapper

package Hadoop::Mapper;
use Moose::Role;
 
requires 'map';
 
sub setup() { 1; }
sub cleanup { 1;}

sub map($self, $key, $value, $context) – executed for every (key, value) input pair. The variable '$content' has following methods:
- $content->write($key, $value) – output the ($key, $value) pair
- $content->counter($group, $name, $increment) – increases the counter $name in the group $group by $increment
sub setup($self, $context) – executed once before any input (key, value) pairs are processed. The $context can be used to both produce (key, value) pairs and increment counters.
sub cleanup($self, $context) – executed once after all input (key, value) pairs are processed. The $context can be used to both produce (key, value) pairs and increment counters.

Hadoop::Reducer

package Hadoop::Reduce;
use Moose::Role;
 
requires 'reduce';
 
sub setup() { 1; }
sub cleanup { 1;}

sub reduce($self, $key, $values, $context) – executed for every (key, values) input data.
- fsa

The variable '$content' has following methods:

$content->write($key, $value) – output the ($key, $value) pair
$content->counter($group, $name, $increment) – increases the counter $name in the group $group by $increment
sub setup($self, $context) – executed once before any input (key, values) pairs are processed. The $context can be used to both produce (key, value) pairs and increment counters.
sub cleanup($self, $context) – executed once after all input (key, values) pairs are processed. The $context can be used to both produce (key, value) pairs and increment counters.

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Table of Contents

MapReduce Tutorial - Perl API

Hadoop::Runner

Hadoop::Mapper

Hadoop::Reducer