Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-5 [2012/01/24 22:15]
straka
+++ courses:mapreduce-tutorial:step-5 [2012/01/29 20:21]
straka
@@ Line 1: / Line 1: @@
 ====== MapReduce Tutorial : Basic reducer ======
-The interesting part of a MR job is the reducer -- after all mappers produce the (key, value) pairs, for every unique key and all its values a ''reduce'' function is called. The ''reduce'' function can output (key, value) pairs, which are written to disk.
+The interesting part of a Hadoop job is the //reducer// -- after all mappers produce the (key, value) pairs, for every unique key and all its values a ''reduce'' function is called. The ''reduce'' function can output (key, value) pairs, which are written to disk.
-The ''reduce'' is similar to ''map'', but instead of one value it gets an iterator, which can enumerate all values:
+The ''reduce'' is similar to ''map'', but instead of one value it gets an iterator (instance of Hadoop::Runner::ValueIterator), which enumerates all values associated with the key:
-<file perl reducer.pl>
+<file perl>
 package Mapper;
 use Moose;
@@ Line 38: / Line 38: @@
 </file>
+As before, Hadoop silently handles failures. It can happen that even a successfully finished mapper needs to be executed again -- if the machine, where its output data were stored, gets disconnected from the network.
+===== Exercise 1 =====
+Run a Hadoop job on ''/home/straka/wiki/cs-text-small'', which counts occurrences of every word in the article texts. You can download the template {{:courses:mapreduce-tutorial:step-5-exercise1.txt|step-5-exercise1.pl}}  and execute it.
+  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-exercise1.txt' -O 'step-5-exercise1.pl'
+  rm -rf step-5-out-ex1; perl step-5-exercise1.pl run /home/straka/wiki/cs-text-medium/ step-5-out-ex1
+  less step-5-out-ex1/part-*
+==== Solution ====
+You can also download the solution {{:courses:mapreduce-tutorial:step-5-solution1.txt|step-5-solution1.pl}} and check the correct output.
+  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution1.txt' -O 'step-5-solution1.pl'
+  rm -rf step-5-out-sol1; perl step-5-solution1.pl run /home/straka/wiki/cs-text-medium/ step-5-out-sol1
+  less step-5-out-sol1/part-*
+===== Exercise 2 =====
+Run a Hadoop job on ''/home/straka/wiki/cs-text-small'', which generates an inverted index. Inverted index contains for each word all its //occurrences//, where each occurrence is pair (article of occurrence, position of occurrence). You can download the template {{:courses:mapreduce-tutorial:step-5-exercise2.txt|step-5-exercise2.pl}}  and execute it.
+  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-exercise2.txt' -O 'step-5-exercise2.pl'
+  rm -rf step-5-out-ex2; perl step-5-exercise2.pl run /home/straka/wiki/cs-text-tiny/ step-5-out-ex2
+  less step-5-out-ex2/part-*
+==== Solution ====
+You can also download the solution {{:courses:mapreduce-tutorial:step-5-solution2.txt|step-5-solution2.pl}} and check the correct output.
+  wget --no-check-certificate 'https://wiki.ufal.ms.mff.cuni.cz/_media/courses:mapreduce-tutorial:step-5-solution2.txt' -O 'step-5-solution2.pl'
+  rm -rf step-5-out-sol2; perl step-5-solution2.pl run /home/straka/wiki/cs-text-tiny/ step-5-out-sol2
+  less step-5-out-sol2/part-*
+----
+<html>
+<table style="width:100%">
+<tr>
+<td style="text-align:left; width: 33%; "></html>[[step-4|Step 4]]: Counters.<html></td>
+<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td>
+<td style="text-align:right; width: 33%; "></html>[[step-6|Step 6]]: Running on cluster.<html></td>
+</tr>
+</table>
+</html>

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences