Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-3 [2012/01/25 21:33]
straka
+++ courses:mapreduce-tutorial:step-3 [2012/01/27 21:01]
straka
@@ Line 3: / Line 3: @@
 The simplest Hadoop job consists of a mapper only.  The input data is divided in several parts, every processed by an independent mapper, and the results are collected in one directory, one file per mapper.
-The Hadoop framework handles
+The Hadoop framework silently handles failures. If a mapper task fails, another is executed and the input of the failed attempt is discarded.
 ===== Example Perl mapper =====
-<file perl mapper.pl>
+<file perl>
 #!/usr/bin/perl
@@ Line 34: / Line 34: @@
 The values ''input_format'', ''output_format'' and ''output_compression'' could be left out, because they are all set to their default value.
-Resulting script can be executed locally (not distributed) using
+Resulting script can be executed locally in a single thread using
   perl script.pl run input_directory output_directory
 All files in input_directory are processes. The output_directory must not exist.
@@ Line 40: / Line 40: @@
 ===== Exercise =====
-To check that your Hadoop environment works, try running a MR job on ''/home/straka/wiki/cs-text'', which outputs only articles with names beginning with a (ignoring the case).
+To check that your Hadoop environment works, try running a MR job on ''/home/straka/wiki/cs-text'', which outputs only articles with names beginning with an ''A'' (ignoring the case). You can download the template {{:courses:mapreduce-tutorial:step-3-exercise.txt|step-3-exercise.pl}} and execute it using
+  rm -rf step-3-output; perl step-3-exercise.pl run /home/straka/wiki/cs-text step-3-output
 {{.:step-3-solution.txt|Solution.pl}}

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences