Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-3 [2012/01/24 19:14]
straka
+++ courses:mapreduce-tutorial:step-3 [2012/01/27 21:01]
straka
@@ Line 1: / Line 1: @@
 ====== MapReduce Tutorial : Basic mapper ======
-The simplest MR job consists of a mapper only.  The input data is divided in several parts, every processed by an independent mapper, and the results are collected in one directory, one file per mapper.
+The simplest Hadoop job consists of a mapper only.  The input data is divided in several parts, every processed by an independent mapper, and the results are collected in one directory, one file per mapper.
-===== Example perl mapper =====
+The Hadoop framework silently handles failures. If a mapper task fails, another is executed and the input of the failed attempt is discarded.
-<code perl>
+===== Example Perl mapper =====
+<file perl>
 #!/usr/bin/perl
@@ Line 28: / Line 30: @@
 $runner->run();
-</code>
+</file>
 The values ''input_format'', ''output_format'' and ''output_compression'' could be left out, because they are all set to their default value.
-Resulting script can be executed using
+Resulting script can be executed locally in a single thread using
   perl script.pl run input_directory output_directory
 All files in input_directory are processes. The output_directory must not exist.
+===== Exercise =====
+To check that your Hadoop environment works, try running a MR job on ''/home/straka/wiki/cs-text'', which outputs only articles with names beginning with an ''A'' (ignoring the case). You can download the template {{:courses:mapreduce-tutorial:step-3-exercise.txt|step-3-exercise.pl}} and execute it using
+  rm -rf step-3-output; perl step-3-exercise.pl run /home/straka/wiki/cs-text step-3-output
+{{.:step-3-solution.txt|Solution.pl}}

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences