Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-26 [2012/01/28 15:41]
straka
+++ courses:mapreduce-tutorial:step-26 [2012/01/28 20:26]
straka
@@ Line 1: / Line 1: @@
-====== MapReduce Tutorial : Counters and job configuration ======
+====== MapReduce Tutorial : Counters, compression and job configuration ======
 ===== Counters =====
@@ Line 22: / Line 22: @@
 }
 </code>
+===== Compression =====
+The output files can be compressed using
+<code java>
+  FileOutputFormat.setCompressOutput(job, true);
+</code>
+The default compression format is ''deflate'' -- raw Zlib compression. Several other compression formats can be selected:
+<code java>
+import org.apache.hadoop.io.compress.*;
+  ...
+  FileOutputFormat.setOutputCompressorClass(GzipCodec.class);   //.gz
+  FileOutputFormat.setOutputCompressorClass(BZip2Codec.class);  //.bz2
+</code>
+Of course, any of these formats is decompressed transparently when the file is being read.
 ===== Job configuration =====
@@ Line 36: / Line 54: @@
 Apart from already mentioned [[.:step-9#a-brief-list-of-hadoop-options|brief list of Hadoop properties]], there is one important Java-specific property:
+  * **''mapred.child.java.opts''** with default value **''-Xmx200m''**. This property sets the Java options used for every task attempt (mapper, reducer, combiner, partitioner). The default value ''-Xmx200m'' specifies the maximum size of memory allocation pool. If your mappers and reducers need 1GB memory, use ''-Xmx1024m''. Other Java options can be found in ''man java''.

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences