| Both sides previous revision
Previous revision
Next revision
|
Previous revision
|
courses:mapreduce-tutorial:step-26 [2012/01/28 20:21] straka |
courses:mapreduce-tutorial:step-26 [2012/01/31 14:41] (current) straka |
| ====== MapReduce Tutorial : Counters, compression and job configuration ====== | ====== MapReduce Tutorial : Compression and job configuration ====== |
| |
| ===== Counters ===== | |
| |
| As in the Perl API, a mapper or a reducer can increment various counters by using ''context.getCounter("Group", "Name").increment(value)'': | |
| | ===== Compression ===== |
| | |
| | The output files can be compressed using |
| <code java> | <code java> |
| public void map(Text key, Text value, Context context) throws IOException, InterruptedException { | FileOutputFormat.setCompressOutput(job, true); |
| ... | |
| context.getCounter("Group", "Name").increment(value); | |
| ... | |
| } | |
| </code> | </code> |
| The ''getCounter'' method returns a [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Counter.html|Counter]] object, so if a counter is incremented frequently, the ''getCounter'' method can be called only once: | |
| | The default compression format is ''deflate'' -- raw Zlib compression. Several other compression formats can be selected: |
| <code java> | <code java> |
| public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { | import org.apache.hadoop.io.compress.*; |
| ... | ... |
| Counter values = context.getCounter("Reducer", "Number of values"); | FileOutputFormat.setOutputCompressorClass(GzipCodec.class); //.gz |
| for (IntWritable value : values) { | FileOutputFormat.setOutputCompressorClass(BZip2Codec.class); //.bz2 |
| ... | |
| values.increment(1); | |
| } | |
| } | |
| </code> | </code> |
| | |
| | Of course, any of these formats is decompressed transparently when the file is being read. |
| |
| ===== Job configuration ===== | ===== Job configuration ===== |
| Apart from already mentioned [[.:step-9#a-brief-list-of-hadoop-options|brief list of Hadoop properties]], there is one important Java-specific property: | Apart from already mentioned [[.:step-9#a-brief-list-of-hadoop-options|brief list of Hadoop properties]], there is one important Java-specific property: |
| * **''mapred.child.java.opts''** with default value **''-Xmx200m''**. This property sets the Java options used for every task attempt (mapper, reducer, combiner, partitioner). The default value ''-Xmx200m'' specifies the maximum size of memory allocation pool. If your mappers and reducers need 1GB memory, use ''-Xmx1024m''. Other Java options can be found in ''man java''. | * **''mapred.child.java.opts''** with default value **''-Xmx200m''**. This property sets the Java options used for every task attempt (mapper, reducer, combiner, partitioner). The default value ''-Xmx200m'' specifies the maximum size of memory allocation pool. If your mappers and reducers need 1GB memory, use ''-Xmx1024m''. Other Java options can be found in ''man java''. |
| | |
| | |
| | ---- |
| | |
| | <html> |
| | <table style="width:100%"> |
| | <tr> |
| | <td style="text-align:left; width: 33%; "></html>[[step-25|Step 25]]: Reducers, combiners and partitioners.<html></td> |
| | <td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> |
| | <td style="text-align:right; width: 33%; "></html>[[step-27|Step 27]]: Running multiple Hadoop jobs in one source file.<html></td> |
| | </tr> |
| | </table> |
| | </html> |