Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-5 [2012/01/28 12:52]
majlis Added links to previous and next chapter.
+++ courses:mapreduce-tutorial:step-5 [2012/01/29 21:30]
straka
@@ Line 3: / Line 3: @@
 The interesting part of a Hadoop job is the //reducer// -- after all mappers produce the (key, value) pairs, for every unique key and all its values a ''reduce'' function is called. The ''reduce'' function can output (key, value) pairs, which are written to disk.
-The ''reduce'' is similar to ''map'', but instead of one value it gets an iterator, which enumerates all values associated with the key:
+The ''reduce'' is similar to ''map'', but instead of one value it gets an iterator (instance of ''Hadoop::Runner::ValueIterator''), which enumerates all values associated with the key:
 <file perl>
@@ Line 39: / Line 39: @@
 As before, Hadoop silently handles failures. It can happen that even a successfully finished mapper needs to be executed again -- if the machine, where its output data were stored, gets disconnected from the network.
+===== Types of keys and values =====
+Currently in the Perl API, the keys and values are both strings, which are stored and loaded using UTF-8 format. If you need more complex structures, you have to serialize and deserialize them by yourselves.
+The Java API offers a wide range of types, including user-defined types, to be used for keys and values.
 ===== Exercise 1 =====

Institute of Formal and Applied Linguistics Wiki