[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
courses:mapreduce-tutorial:perl-api [2012/01/26 17:08]
straka
courses:mapreduce-tutorial:perl-api [2012/01/31 09:38] (current)
straka Change Perl commandline syntax.
Line 17: Line 17:
  
 has 'hadoop_prefix' => (isa => 'Str', default => '/SGE/HADOOP/active'); has 'hadoop_prefix' => (isa => 'Str', default => '/SGE/HADOOP/active');
-has 'keep_env' => (isa => 'ArrayRef[Str]', default => sub { ["PATH", "PERLLIB", "PERL5LIB"] });+has 'copy_environment' => (isa => 'ArrayRef[Str]', default => sub { [] });
  
 sub run(); sub run();
Line 29: Line 29:
   * ''output_compression'' -- Bool flag controlling the compression of output   * ''output_compression'' -- Bool flag controlling the compression of output
   * ''hadoop_prefix'' -- the prefix of Hadoop instalation. Default value is fine in UFAL cluster.   * ''hadoop_prefix'' -- the prefix of Hadoop instalation. Default value is fine in UFAL cluster.
-  * ''keep_env'' -- which environment variables are preserved when running perl mappers, reducers, combiners and partitioners+  * ''copy_environment'' -- which environment variables are preserved when running perl mappers, reducers, combiners and partitioners. Needed only when running job using ''-jt'' -- both local execution and execution using ''-c'' option retain all environmental variables.
  
 ==== Command line arguments supported by Hadoop::Runner::run() ==== ==== Command line arguments supported by Hadoop::Runner::run() ====
  
-  script.pl run [-jt/--jobtracker jobtracker | -c/--cluster machines [-w/--wait secs]] [-r/--reducers reducers] [generic Hadoop optionsinput_path output_path +  script.pl [-jt jobtracker | -c number_of_machines [-w secs]] [-r reducers] [-Dname=value -Dname=value ...input output 
-  script.pl map number_of_reducers +  script.pl --map number_of_reducers 
-  script.pl reduce +  script.pl --reduce 
-  script.pl combine+  script.pl --combine
  
 ===== Hadoop::Mapper ===== ===== Hadoop::Mapper =====
Line 49: Line 49:
 sub cleanup {} sub cleanup {}
 </file> </file>
-  * ''sub map($self, $key, $value, $context)'' -- executed for every (key, value) input pair. The variable '$content' has following methods:+  * ''sub map($self, $key, $value, $context)'' -- executed for every (key, value) input pair. The variable ''$content'' has following methods:
     * ''$content%%->%%write($key, $value)'' -- output the (''$key'', ''$value'') pair     * ''$content%%->%%write($key, $value)'' -- output the (''$key'', ''$value'') pair
     * ''$content%%->%%counter($group, $name, $increment)'' -- increases the counter ''$name'' in the group ''$group'' by ''$increment''     * ''$content%%->%%counter($group, $name, $increment)'' -- increases the counter ''$name'' in the group ''$group'' by ''$increment''
Line 69: Line 69:
     * ''$values%%->%%value()'' -- returns the current value, undef if there is any.     * ''$values%%->%%value()'' -- returns the current value, undef if there is any.
     * ''$values%%->%%next()'' -- advance to next value. Returns true if there is any, false otherwise.     * ''$values%%->%%next()'' -- advance to next value. Returns true if there is any, false otherwise.
-    * At the beginning there is no current value, the first value should be obtained by calling 'next'.+    * At the beginning there is no current value, the first value should be obtained by calling ''next''.
   * ''sub reduce($self, $key, $values, $context)'' -- the variable ''$content'' has following methods:   * ''sub reduce($self, $key, $values, $context)'' -- the variable ''$content'' has following methods:
     * ''$content%%->%%write($key, $value)'' -- output the (''$key'', ''$value'') pair     * ''$content%%->%%write($key, $value)'' -- output the (''$key'', ''$value'') pair

[ Back to the navigation ] [ Back to the content ]