[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

This is an old revision of the document!


Table of Contents

MapReduce Tutorial : Predefined formats and types

Currently there are two different Java APIs:

When browsing through the documentation, make sure to stay in org.apache.hadoop.mapreduce namespace.

Types

The Java API differs from the Perl API in one important aspect: the keys and values are types.

The type of a value must be a subclass of Writable, which provides methods for serializing and deserializing values.

The type of a key must be a subclass of WritableComparable, which provides both Writable and Comparable interface.

Here is a list of frequently used types:

For more complicated types like variable-length encoded integers, dictionaries, bloom filters, etc., see Writable.

Types used in Perl API

The Perl API can process keys and values of any type – then using different type than Text, toString method is called to create a String representation.

The keys and values produced by Perl API are always of type Text.

Input formats

The input formats are the same as in Perl API. Every input format also specifies which types it can provide.

An input format is a subclass of FileInputFormat<K,V>, where K is the type of keys and V is the type of values it can load.

Available input formats:

Output formats

An input format is a subclass of FileOutputFormat<K,V>, where K is the type of keys and V is the type of values it can store.

Available output formats:


[ Back to the navigation ] [ Back to the content ]