This is an old revision of the document!
Table of Contents
MapReduce Tutorial : Counters, compression and job configuration
Counters
As in the Perl API, a mapper or a reducer can increment various counters by using context.getCounter(“Group”, “Name”).increment(value):
public void map(Text key, Text value, Context context) throws IOException, InterruptedException { ... context.getCounter("Group", "Name").increment(value); ... }
The getCounter method returns a Counter object, so if a counter is incremented frequently, the getCounter method can be called only once:
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { ... Counter values = context.getCounter("Reducer", "Number of values"); for (IntWritable value : values) { ... values.increment(1); } }
Compression
The output files can be compressed using
FileOutputFormat.setCompressOutput(job, true);
The default compression format is deflate – raw Zlib compression. Several other compression formats can be selected:
import org.apache.hadoop.io.compress.*; ... FileOutputFormat.setOutputCompressorClass(GzipCodec.class); //.gz FileOutputFormat.setOutputCompressorClass(BZip2Codec.class); //.bz2
Of course, any of these formats is decompressed transparently when the file is being read.
Job configuration
The job properties can be set:
- on the command line – the ToolRunner parses options in format
-Dname=value. See the syntax of the hadoop script. - using the Job
.getConfiguration()a Configuration object is retrieved. It provides following methods:String get(String name)– get the value of thenameproperty,nullif it does not exist.String get(String name, String defaultValue)– get the value of thenamepropertygetBoolean,getClass,getFile,getFloat,getInt,getLong,getStrings– return a typed value of thenameproperty (i.e., number, file name, class name, …).set(String name, String value)– set the value of thenameproperty tovalue.setBoolean,setClass,setFile,setFloat,setInt,setLong,setStrings– set the typed value of thenameproperty (i.e., number, file name, class name, …).
- in a mapper or a reducer, the
contextobject also provides thegetConfiguration()method, so the job properties can be accessed in the mappers and reducers too.
Apart from already mentioned brief list of Hadoop properties, there is one important Java-specific property:
mapred.child.java.optswith default value-Xmx200m. This property sets the Java options used for every task attempt (mapper, reducer, combiner, partitioner). The default value-Xmx200mspecifies the maximum size of memory allocation pool. If your mappers and reducers need 1GB memory, use-Xmx1024m. Other Java options can be found inman java.
| Step 25: Reducers, combiners and partitioners. | Overview | Step 27: Custom data types. |
