[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-3 [2012/01/24 19:03]
straka vytvořeno
courses:mapreduce-tutorial:step-3 [2012/01/24 21:11]
straka
Line 1: Line 1:
-====== MapReduce Tutorial : ======+====== MapReduce Tutorial : Basic mapper ====== 
 + 
 +The simplest MR job consists of a mapper only.  The input data is divided in several parts, every processed by an independent mapper, and the results are collected in one directory, one file per mapper. 
 + 
 +===== Example Perl mapper ===== 
 + 
 +<file perl mapper.pl> 
 +#!/usr/bin/perl 
 + 
 +package Mapper; 
 +use Moose; 
 +with 'Hadoop::Mapper'; 
 + 
 +sub map { 
 +  my ($self, $key, $value, $context) = @_; 
 + 
 +  $context->write($key, $value); 
 +
 + 
 +package Main; 
 +use Hadoop::Runner; 
 + 
 +my $runner = Hadoop::Runner->new( 
 +  mapper => Mapper->new(), 
 +  input_format => 'TextInputFormat', 
 +  output_format => 'TextOutputFormat', 
 +  output_compression => 0); 
 + 
 +$runner->run(); 
 +</file> 
 + 
 +The values ''input_format'', ''output_format'' and ''output_compression'' could be left out, because they are all set to their default value. 
 + 
 +Resulting script can be executed locally (not distributed) using 
 +  perl script.pl run input_directory output_directory 
 +All files in input_directory are processes. The output_directory must not exist. 
 + 
 +===== Exercise ===== 
 + 
 +To check that your Hadoop environment works, try running a MR job on ''/home/straka/wiki/cs-text'', which outputs only articles with names beginning with a (ignoring the case). 
 + 
 +{{.:exercise.txt|Solution.pl}}

[ Back to the navigation ] [ Back to the content ]