This is an old revision of the document!

MapReduce Tutorial - Perl API

The main class is Hadoop::Runner:

package Hadoop::Runner;

has 'mapper' => (does => 'Hadoop::Mapper', required => 1);
has 'reducer' => (does => 'Hadoop::Reducer');
has 'combiner' => (does => 'Hadoop::Reducer');
has 'partitioner' => (does => 'Hadoop::Partitioner');

has 'input_format' => (isa => 'InputFormat', default => 'TextInputFormat');
has 'output_format' => (isa => 'OutputFormat', default => 'TextOutputFormat');
has 'output_compression' => (isa => 'Bool', default => 0);

has 'hadoop_prefix' => (isa => 'Str', default => '/SGE/HADOOP/active');
has 'keep_env' => (isa => 'ArrayRef[Str]', default => sub { ["PERLLIB", "PERL5LIB"] });

sub run();

mapper – a Hadoop::Mapper to use
reducer – an optional Hadoop::Reducer to use
combiner – an optional Hadoop::Reducer to use as combiner
partitioner – an optional Hadoop::Partitioner to use
input_format – one of TextInputFormat, KeyValueTextInputFormat, SequenceFileInputFormat
output_format – one of TextOutputFormat, SequenceFileOutputFormat
output_compression – Bool flag controlling the compression of output
hadoop_prefix – the prefix of Hadoop instalation. Default value is fine in UFAL cluster.
keep_env – which environment variables are preserved when running perl mappers, reducers, combiners and partitioners

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

MapReduce Tutorial - Perl API