Table of Contents

MapReduce Tutorial : Compression and job configuration

Compression

The output files can be compressed using

  FileOutputFormat.setCompressOutput(job, true);

The default compression format is deflate – raw Zlib compression. Several other compression formats can be selected:

import org.apache.hadoop.io.compress.*;
 
  ...
  FileOutputFormat.setOutputCompressorClass(GzipCodec.class);   //.gz
  FileOutputFormat.setOutputCompressorClass(BZip2Codec.class);  //.bz2

Of course, any of these formats is decompressed transparently when the file is being read.

Job configuration

The job properties can be set:

Apart from already mentioned brief list of Hadoop properties, there is one important Java-specific property:


Step 25: Reducers, combiners and partitioners. Overview Step 27: Running multiple Hadoop jobs in one source file.