Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-3 [2012/01/24 19:14]
straka
+++ courses:mapreduce-tutorial:step-3 [2012/01/24 20:09]
straka
@@ Line 3: / Line 3: @@
 The simplest MR job consists of a mapper only.  The input data is divided in several parts, every processed by an independent mapper, and the results are collected in one directory, one file per mapper.
-===== Example perl mapper =====
+===== Example Perl mapper =====
-<code perl>
+<code perl mapper.pl>
 #!/usr/bin/perl
@@ Line 32: / Line 32: @@
 The values ''input_format'', ''output_format'' and ''output_compression'' could be left out, because they are all set to their default value.
-Resulting script can be executed using
+Resulting script can be executed locally (not distributed) using
   perl script.pl run input_directory output_directory
 All files in input_directory are processes. The output_directory must not exist.
+===== Exercise =====
+To check that your Hadoop environment works, try running a MR job on ''/home/straka/wiki/cs-text'', which outputs only articles with names beginning with a (ignoring the case).

Institute of Formal and Applied Linguistics Wiki