[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-3 [2012/01/24 19:03]
straka vytvořeno
courses:mapreduce-tutorial:step-3 [2012/01/28 11:35]
majlis Added links to previous and next chapter.
Line 1: Line 1:
-====== MapReduce Tutorial : ======+====== MapReduce Tutorial : Basic mapper ====== 
 + 
 +The simplest Hadoop job consists of a mapper only.  The input data is divided in several parts, every processed by an independent mapper, and the results are collected in one directory, one file per mapper. 
 + 
 +The Hadoop framework silently handles failures. If a mapper task fails, another is executed and the input of the failed attempt is discarded. 
 + 
 +===== Example Perl mapper ===== 
 + 
 +<file perl> 
 +#!/usr/bin/perl 
 + 
 +package Mapper; 
 +use Moose; 
 +with 'Hadoop::Mapper'; 
 + 
 +sub map { 
 +  my ($self, $key, $value, $context) = @_; 
 + 
 +  $context->write($key, $value); 
 +
 + 
 +package Main; 
 +use Hadoop::Runner; 
 + 
 +my $runner = Hadoop::Runner->new( 
 +  mapper => Mapper->new(), 
 +  input_format => 'TextInputFormat', 
 +  output_format => 'TextOutputFormat', 
 +  output_compression => 0); 
 + 
 +$runner->run(); 
 +</file> 
 + 
 +The values ''input_format'', ''output_format'' and ''output_compression'' could be left out, because they are all set to their default value. 
 + 
 +Resulting script can be executed locally in a single thread using 
 +  perl script.pl run input_directory output_directory 
 +All files in input_directory are processes. The output_directory must not exist. 
 + 
 +===== Exercise ===== 
 + 
 +To check that your Hadoop environment works, try running a MR job on ''/home/straka/wiki/cs-text'', which outputs only articles with names beginning with an ''A'' (ignoring the case). You can download the template {{:courses:mapreduce-tutorial:step-3-exercise.txt|step-3-exercise.pl}}  and execute it. 
 +  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-3-exercise.txt' -O 'step-3-exercise.pl' 
 +  rm -rf step-3-out-ex; perl step-3-exercise.pl run /home/straka/wiki/cs-text-medium/ step-3-out-ex 
 +  less step-3-out-ex/part-m-* 
 +   
 +==== Solution ==== 
 +You can also download the solution {{:courses:mapreduce-tutorial:step-3-solution.txt|step-3-solution.pl}} and check the correct output. 
 +  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-3-solution.txt' -O 'step-3-solution.pl' 
 +  rm -rf step-3-out-sol; perl step-3-solution.pl run /home/straka/wiki/cs-text-medium/ step-3-out-sol 
 +  less step-3-out-sol/part-m-* 
 + 
 +---- 
 + 
 +<html> 
 +<table style="width:100%"> 
 +<tr> 
 +<td style="text-align:left; width: 33%; "></html>[[step-2|Step 2]]: Input and output format, testing data.<html></td> 
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> 
 +<td style="text-align:right; width: 33%; "></html>[[step-4|Step 4]]: Counters.<html></td> 
 +</tr> 
 +</table> 
 +</html>

[ Back to the navigation ] [ Back to the content ]