Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision Next revision Both sides next revision | ||
courses:mapreduce-tutorial:step-3 [2012/01/24 19:03] straka vytvořeno |
courses:mapreduce-tutorial:step-3 [2012/01/24 21:22] straka |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== MapReduce Tutorial : ====== | + | ====== MapReduce Tutorial : Basic mapper |
+ | |||
+ | The simplest MR job consists of a mapper only. The input data is divided in several parts, every processed by an independent mapper, and the results are collected in one directory, one file per mapper. | ||
+ | |||
+ | ===== Example Perl mapper ===== | ||
+ | |||
+ | <file perl mapper.pl> | ||
+ | # | ||
+ | |||
+ | package Mapper; | ||
+ | use Moose; | ||
+ | with ' | ||
+ | |||
+ | sub map { | ||
+ | my ($self, $key, $value, $context) = @_; | ||
+ | |||
+ | $context-> | ||
+ | } | ||
+ | |||
+ | package Main; | ||
+ | use Hadoop:: | ||
+ | |||
+ | my $runner = Hadoop:: | ||
+ | mapper => Mapper-> | ||
+ | input_format => ' | ||
+ | output_format => ' | ||
+ | output_compression => 0); | ||
+ | |||
+ | $runner-> | ||
+ | </ | ||
+ | |||
+ | The values '' | ||
+ | |||
+ | Resulting script can be executed locally (not distributed) using | ||
+ | perl script.pl run input_directory output_directory | ||
+ | All files in input_directory are processes. The output_directory must not exist. | ||
+ | |||
+ | ===== Exercise ===== | ||
+ | |||
+ | To check that your Hadoop environment works, try running a MR job on ''/ | ||
+ | |||
+ | {{.: |