Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-3 [2012/01/24 19:03]
straka vytvořeno
+++ courses:mapreduce-tutorial:step-3 [2012/01/24 21:11]
straka
@@ Line 1: / Line 1: @@
-====== MapReduce Tutorial : ======
+====== MapReduce Tutorial : Basic mapper ======
+The simplest MR job consists of a mapper only.  The input data is divided in several parts, every processed by an independent mapper, and the results are collected in one directory, one file per mapper.
+===== Example Perl mapper =====
+<file perl mapper.pl>
+#!/usr/bin/perl
+package Mapper;
+use Moose;
+with 'Hadoop::Mapper';
+sub map {
+  my ($self, $key, $value, $context) = @_;
+  $context->write($key, $value);
+}
+package Main;
+use Hadoop::Runner;
+my $runner = Hadoop::Runner->new(
+  mapper => Mapper->new(),
+  input_format => 'TextInputFormat',
+  output_format => 'TextOutputFormat',
+  output_compression => 0);
+$runner->run();
+</file>
+The values ''input_format'', ''output_format'' and ''output_compression'' could be left out, because they are all set to their default value.
+Resulting script can be executed locally (not distributed) using
+  perl script.pl run input_directory output_directory
+All files in input_directory are processes. The output_directory must not exist.
+===== Exercise =====
+To check that your Hadoop environment works, try running a MR job on ''/home/straka/wiki/cs-text'', which outputs only articles with names beginning with a (ignoring the case).
+{{.:exercise.txt|Solution.pl}}

Institute of Formal and Applied Linguistics Wiki