[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-5 [2012/01/24 19:04]
straka vytvořeno
courses:mapreduce-tutorial:step-5 [2012/01/24 22:30]
straka
Line 1: Line 1:
-====== MapReduce Tutorial : ======+====== MapReduce Tutorial : Basic reducer ====== 
 + 
 +The interesting part of a MR job is the reducer -- after all mappers produce the (key, value) pairs, for every unique key and all its values a ''reduce'' function is called. The ''reduce'' function can output (key, value) pairs, which are written to disk. 
 + 
 +The ''reduce'' is similar to ''map'', but instead of one value it gets an iterator, which can enumerate all values: 
 + 
 +<file perl reducer.pl> 
 +package Mapper; 
 +use Moose; 
 +with 'Hadoop::Mapper'; 
 + 
 +sub map { 
 +  my ($self, $key, $value, $context) = @_; 
 + 
 +  $context->write($key, $value); 
 +
 + 
 +package Reducer; 
 +use Moose; 
 +with 'Hadoop::Reducer'; 
 + 
 +sub reduce { 
 +  my ($self, $key, $values, $context) = @_; 
 + 
 +  while ($values->next) { 
 +    $context->write($key, $values->value); 
 +  } 
 +
 + 
 +package Main; 
 +use Hadoop::Runner; 
 + 
 +my $runner = Hadoop::Runner->new( 
 +  mapper => Mapper->new(), 
 +  reducer => Reducer->new()); 
 + 
 +$runner->run(); 
 +</file> 
 + 
 +===== Exercise 1 ===== 
 + 
 +Run a MR job on /home/straka/wiki/cs-text-small, which counts occurences of every word in the article texts. 
 + 
 +===== Exercise 2 ===== 
 + 
 +Run a MR job on /home/straka/wiki/cs-text-small, which generates an inverted index. Inverted index contains for each word all its occurrences, each occurrence is pair (article of occurrence, position of occurrence). 

[ Back to the navigation ] [ Back to the content ]