[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
courses:mapreduce-tutorial:step-26 [2012/01/28 15:29]
straka
courses:mapreduce-tutorial:step-26 [2012/01/31 11:29]
straka
Line 1: Line 1:
-====== MapReduce Tutorial : Counters and job configuration ======+====== MapReduce Tutorial : Compression and job configuration ======
  
-===== Counters ===== 
  
-As in the Perl API, a mapper or a reducer can increment various counters by using ''context.getCounter("Group", "Name").increment(value)'':+ 
 +===== Compression ===== 
 + 
 +The output files can be compressed using
 <code java> <code java>
-public void map(Text key, Text value, Context context) throws IOException, InterruptedException { +  FileOutputFormat.setCompressOutput(jobtrue);
-  ... +
-  context.getCounter("Group""Name").increment(value); +
-  ... +
-}+
 </code> </code>
-The ''getCounter'' method returns a [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Counter.html|Counter]] object, so if a counter is incremented frequently, the ''getCounter'' method can be called only once:+   
 +The default compression format is ''deflate'' -- raw Zlib compressionSeveral other compression formats can be selected:
 <code java> <code java>
-public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {+import org.apache.hadoop.io.compress.*; 
   ...   ...
-  Counter values = context.getCounter("Reducer", "Number of values"); +  FileOutputFormat.setOutputCompressorClass(GzipCodec.class);   //.gz 
-  for (IntWritable value : values) { +  FileOutputFormat.setOutputCompressorClass(BZip2Codec.class);  //.bz2
-    ... +
-    values.increment(1); +
-  +
-}+
 </code> </code>
 +
 +Of course, any of these formats is decompressed transparently when the file is being read.
  
 ===== Job configuration ===== ===== Job configuration =====
  
 The job properties can be set: The job properties can be set:
-  * on the command line -- the [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/util/ToolRunner.html|ToolRunner]] parses options in format ''-Dname=value'' +  * on the command line -- the [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/util/ToolRunner.html|ToolRunner]] parses options in format ''-Dname=value''. See the [[.:step-24#running-the-job|syntax of the hadoop script]]. 
-  * using the [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html|Job]] object. +  * using the [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html|Job]]''.getConfiguration()'' a [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/conf/Configuration.html|Configuration]] object is retrievedIt provides following methods: 
-See the [[.:step-9#a-brief-list-of-hadoop-options|brief list of Hadoop properties]].+    * ''String get(String name)'' -- get the value of the ''name'' property, ''null'' if it does not exist. 
 +    * ''String get(String name, String defaultValue)'' -- get the value of the ''name'' property 
 +    * ''getBoolean'', ''getClass'', ''getFile'', ''getFloat'', ''getInt'', ''getLong'', ''getStrings'' -- return a typed value of the ''name'' property (i.e., number, file name, class name, ...). 
 +    * ''set(String name, String value)'' -- set the value of the ''name'' property to ''value''
 +    * ''setBoolean'', ''setClass'', ''setFile'', ''setFloat'', ''setInt'', ''setLong'', ''setStrings'' -- set the typed value of the ''name'' property (i.e., number, file name, class name, ...). 
 +  * in a mapper or a reducer, the ''context'' object also provides the ''getConfiguration()'' method, so the job properties can be accessed in the mappers and reducers too. 
 + 
 +Apart from already mentioned [[.:step-9#a-brief-list-of-hadoop-options|brief list of Hadoop properties]], there is one important Java-specific property: 
 +  * **''mapred.child.java.opts''** with default value **''-Xmx200m''**. This property sets the Java options used for every task attempt (mapper, reducer, combiner, partitioner). The default value ''-Xmx200m'' specifies the maximum size of memory allocation pool. If your mappers and reducers need 1GB memory, use ''-Xmx1024m''. Other Java options can be found in ''man java''
 + 
 + 
 +----
  
-There is one +<html> 
 +<table style="width:100%"> 
 +<tr> 
 +<td style="text-align:left; width: 33%; "></html>[[step-25|Step 25]]: Reducers, combiners and partitioners.<html></td> 
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> 
 +<td style="text-align:right; width: 33%; "></html>[[step-27|Step 27]]: Custom data types.<html></td> 
 +</tr> 
 +</table> 
 +</html>

[ Back to the navigation ] [ Back to the content ]