[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
courses:mapreduce-tutorial:step-23 [2012/01/27 20:11]
straka vytvořeno
courses:mapreduce-tutorial:step-23 [2012/01/31 14:33] (current)
dusek
Line 1: Line 1:
-====== MapReduce Tutorial : ======+====== MapReduce Tutorial : Predefined formats and types ====== 
 + 
 +Currently there are two different Java APIs: 
 +  * [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapred/package-summary.html|org.apache.hadoop.mapred]]: This is the original API, which is currently //deprecated//
 +  * [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/package-summary.html|org.apache.hadoop.mapreduce]]: This is the new API, which we will be using in this tutorial. The only problem is that some library classes have not yet been converted to use the new API and we cannot therefore use them. 
 +When browsing through the documentation, make sure to stay in ''org.apache.hadoop.mapreduce'' namespace. 
 + 
 +===== Types ===== 
 + 
 +The Java API differs from the Perl API in one important aspect: the keys and values are types. 
 + 
 +The type of a value must be a subclass of [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/Writable.html|Writable]], which provides methods for serializing and deserializing values. 
 + 
 +The type of a key must be a subclass of [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/WritableComparable.html|WritableComparable]], which provides both ''Writable'' and ''Comparable'' interface. 
 + 
 +Here is a list of frequently used types: 
 +  * ''Text'' -- UTF-8 encoded string 
 +  * ''BytesWritable'' -- sequence of arbitrary bytes 
 +  * ''IntWritable'' -- 32-bit integer 
 +  * ''LongWritable'' -- 64-bit integer 
 +  * ''FloatWritable'' -- 64-bit floating number 
 +  * ''DoubleWritable'' -- 64-bit floating number 
 +  * ''NullWritable'' -- no value 
 +For more complicated types like variable-length encoded integers, dictionaries, bloom filters, etc., see [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/Writable.html|Writable]]. 
 + 
 +==== Types used in Perl API ==== 
 +The Perl API is always using strings as keys and values. From the Java point of view: 
 +  * the type of keys and values produced by Perl API is always ''Text''
 +  * any type can be used as input to Perl API -- if the type is different from ''Text'', a ''toString'' method is used to convert the value to string before the value is passed to Perl. 
 + 
 +===== Input formats ===== 
 +The input formats are the same as in Perl API. Every input format also specifies which types it can provide. 
 + 
 +An input format is a subclass of [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html|FileInputFormat<K,V>]], where //K// is the type of keys and //V// is the type of values it can load. 
 + 
 +Available input formats: 
 +  * ''TextInputFormat'': The type of keys is ''LongWritable'' and the type of values is ''Text''
 +  * ''KeyValueTextInputFormat'': The type of both keys and values is ''Text''
 +  * ''SequenceFileInputFormat'': Any type of keys and values can be used. 
 + 
 +===== Output formats ===== 
 +An output format is a subclass of [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html|FileOutputFormat<K,V>]], where //K// is the type of keys and //V// is the type of values it can store. 
 + 
 +Available output formats: 
 +  * ''TextOutputFormat'': The type of both keys and values is ''Text''
 +  * ''SequenceFileOutputFormat'': Any type of keys and values can be used. 
 + 
 +---- 
 + 
 +<html> 
 +<table style="width:100%"> 
 +<tr> 
 +<td style="text-align:left; width: 33%; "></html>[[step-22|Step 22]]: Optional – Setting Eclipse.<html></td> 
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> 
 +<td style="text-align:right; width: 33%; "></html>[[step-24|Step 24]]: Mappers, running Java Hadoop jobs, combiners.<html></td> 
 +</tr> 
 +</table> 
 +</html>
  

[ Back to the navigation ] [ Back to the content ]