Table of Contents

MapReduce tutorial : Input and output format, testing data.

The MapReduce framework is frequently using (key, value) pairs. These pairs can be read from a file and written to a file and there are several formats available.

Input formats

The input format can be compressed and will be decompressed transparently by the MR framework.

Output formats

The output format can be compressed on demand.

Input data

Testing data are available in several formats and sizes:

It is recommended to use the text format in the tutorial, so that both input and output files are readable.


Step 1: Setting the environment. Overview Step 3: Basic mapper.