Both sides previous revision
Previous revision
Next revision
|
Previous revision
|
courses:mapreduce-tutorial:perl-api [2012/01/26 17:08] straka |
courses:mapreduce-tutorial:perl-api [2012/01/31 09:38] (current) straka Change Perl commandline syntax. |
| |
has 'hadoop_prefix' => (isa => 'Str', default => '/SGE/HADOOP/active'); | has 'hadoop_prefix' => (isa => 'Str', default => '/SGE/HADOOP/active'); |
has 'keep_env' => (isa => 'ArrayRef[Str]', default => sub { ["PATH", "PERLLIB", "PERL5LIB"] }); | has 'copy_environment' => (isa => 'ArrayRef[Str]', default => sub { [] }); |
| |
sub run(); | sub run(); |
* ''output_compression'' -- Bool flag controlling the compression of output | * ''output_compression'' -- Bool flag controlling the compression of output |
* ''hadoop_prefix'' -- the prefix of Hadoop instalation. Default value is fine in UFAL cluster. | * ''hadoop_prefix'' -- the prefix of Hadoop instalation. Default value is fine in UFAL cluster. |
* ''keep_env'' -- which environment variables are preserved when running perl mappers, reducers, combiners and partitioners | * ''copy_environment'' -- which environment variables are preserved when running perl mappers, reducers, combiners and partitioners. Needed only when running job using ''-jt'' -- both local execution and execution using ''-c'' option retain all environmental variables. |
| |
==== Command line arguments supported by Hadoop::Runner::run() ==== | ==== Command line arguments supported by Hadoop::Runner::run() ==== |
| |
script.pl run [-jt/--jobtracker jobtracker | -c/--cluster machines [-w/--wait secs]] [-r/--reducers reducers] [generic Hadoop options] input_path output_path | script.pl [-jt jobtracker | -c number_of_machines [-w secs]] [-r reducers] [-Dname=value -Dname=value ...] input output |
script.pl map number_of_reducers | script.pl --map number_of_reducers |
script.pl reduce | script.pl --reduce |
script.pl combine | script.pl --combine |
| |
===== Hadoop::Mapper ===== | ===== Hadoop::Mapper ===== |
sub cleanup {} | sub cleanup {} |
</file> | </file> |
* ''sub map($self, $key, $value, $context)'' -- executed for every (key, value) input pair. The variable '$content' has following methods: | * ''sub map($self, $key, $value, $context)'' -- executed for every (key, value) input pair. The variable ''$content'' has following methods: |
* ''$content%%->%%write($key, $value)'' -- output the (''$key'', ''$value'') pair | * ''$content%%->%%write($key, $value)'' -- output the (''$key'', ''$value'') pair |
* ''$content%%->%%counter($group, $name, $increment)'' -- increases the counter ''$name'' in the group ''$group'' by ''$increment'' | * ''$content%%->%%counter($group, $name, $increment)'' -- increases the counter ''$name'' in the group ''$group'' by ''$increment'' |
* ''$values%%->%%value()'' -- returns the current value, undef if there is any. | * ''$values%%->%%value()'' -- returns the current value, undef if there is any. |
* ''$values%%->%%next()'' -- advance to next value. Returns true if there is any, false otherwise. | * ''$values%%->%%next()'' -- advance to next value. Returns true if there is any, false otherwise. |
* At the beginning there is no current value, the first value should be obtained by calling 'next'. | * At the beginning there is no current value, the first value should be obtained by calling ''next''. |
* ''sub reduce($self, $key, $values, $context)'' -- the variable ''$content'' has following methods: | * ''sub reduce($self, $key, $values, $context)'' -- the variable ''$content'' has following methods: |
* ''$content%%->%%write($key, $value)'' -- output the (''$key'', ''$value'') pair | * ''$content%%->%%write($key, $value)'' -- output the (''$key'', ''$value'') pair |