[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
courses:mapreduce-tutorial:step-5 [2012/01/24 22:37]
straka
courses:mapreduce-tutorial:step-5 [2012/01/25 21:56]
straka
Line 1: Line 1:
 ====== MapReduce Tutorial : Basic reducer ====== ====== MapReduce Tutorial : Basic reducer ======
  
-The interesting part of a MR job is the reducer -- after all mappers produce the (key, value) pairs, for every unique key and all its values a ''reduce'' function is called. The ''reduce'' function can output (key, value) pairs, which are written to disk.+The interesting part of a Hadoop job is the //reducer// -- after all mappers produce the (key, value) pairs, for every unique key and all its values a ''reduce'' function is called. The ''reduce'' function can output (key, value) pairs, which are written to disk.
  
-The ''reduce'' is similar to ''map'', but instead of one value it gets an iterator, which can enumerate all values:+The ''reduce'' is similar to ''map'', but instead of one value it gets an iterator, which enumerates all values associated with the key:
  
-<file perl reducer.pl>+<file perl>
 package Mapper; package Mapper;
 use Moose; use Moose;
Line 37: Line 37:
 $runner->run(); $runner->run();
 </file> </file>
 +
 +As before, Hadoop silently handles failures. It can happen that even a successfully finished mapper needs to be executed again -- if the machine, where its output data were stored, gets disconnected from the network.
  
 ===== Exercise 1 ===== ===== Exercise 1 =====
  
-Run a MR job on /home/straka/wiki/cs-text-small, which counts occurences of every word in the article texts.+Run a Hadoop job on ''/home/straka/wiki/cs-text-small'', which counts occurrences of every word in the article texts.
  
 {{:courses:mapreduce-tutorial:step-5-solution1.txt|Solution.pl}} {{:courses:mapreduce-tutorial:step-5-solution1.txt|Solution.pl}}
Line 46: Line 48:
 ===== Exercise 2 ===== ===== Exercise 2 =====
  
-Run a MR job on /home/straka/wiki/cs-text-small, which generates an inverted index. Inverted index contains for each word all its occurrences, each occurrence is pair (article of occurrence, position of occurrence).+Run a Hadoop job on ''/home/straka/wiki/cs-text-small'', which generates an inverted index. Inverted index contains for each word all its //occurrences//where each occurrence is pair (article of occurrence, position of occurrence).
  
 {{:courses:mapreduce-tutorial:step-5-solution2.txt|Solution.pl}} {{:courses:mapreduce-tutorial:step-5-solution2.txt|Solution.pl}}
 +

[ Back to the navigation ] [ Back to the content ]