[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
courses:mapreduce-tutorial:step-26 [2012/01/28 15:29]
straka
courses:mapreduce-tutorial:step-26 [2012/01/31 14:41] (current)
straka
Line 1: Line 1:
-====== MapReduce Tutorial : Counters and job configuration ======+====== MapReduce Tutorial : Compression and job configuration ======
  
-===== Counters ===== 
  
-As in the Perl API, a mapper or a reducer can increment various counters by using ''context.getCounter("Group", "Name").increment(value)'':+ 
 +===== Compression ===== 
 + 
 +The output files can be compressed using
 <code java> <code java>
-public void map(Text key, Text value, Context context) throws IOException, InterruptedException { +  FileOutputFormat.setCompressOutput(jobtrue);
-  ... +
-  context.getCounter("Group""Name").increment(value); +
-  ... +
-}+
 </code> </code>
-The ''getCounter'' method returns a [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Counter.html|Counter]] object, so if a counter is incremented frequently, the ''getCounter'' method can be called only once:+   
 +The default compression format is ''deflate'' -- raw Zlib compressionSeveral other compression formats can be selected:
 <code java> <code java>
-public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {+import org.apache.hadoop.io.compress.*; 
   ...   ...
-  Counter values = context.getCounter("Reducer", "Number of values"); +  FileOutputFormat.setOutputCompressorClass(GzipCodec.class);   //.gz 
-  for (IntWritable value : values) { +  FileOutputFormat.setOutputCompressorClass(BZip2Codec.class);  //.bz2
-    ... +
-    values.increment(1); +
-  +
-}+
 </code> </code>
 +
 +Of course, any of these formats is decompressed transparently when the file is being read.
  
 ===== Job configuration ===== ===== Job configuration =====
  
 The job properties can be set: The job properties can be set:
-  * on the command line -- the [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/util/ToolRunner.html|ToolRunner]] parses options in format ''-Dname=value'' +  * on the command line -- the [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/util/ToolRunner.html|ToolRunner]] parses options in format ''-Dname=value''. See the [[.:step-24#running-the-job|syntax of the hadoop script]]. 
-  * using the [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html|Job]] object. +  * using the [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapreduce/Job.html|Job]]''.getConfiguration()'' a [[http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/conf/Configuration.html|Configuration]] object is retrievedIt provides following methods: 
-See the [[.:step-9#a-brief-list-of-hadoop-options|brief list of Hadoop properties]].+    * ''String get(String name)'' -- get the value of the ''name'' property, ''null'' if it does not exist. 
 +    * ''String get(String name, String defaultValue)'' -- get the value of the ''name'' property 
 +    * ''getBoolean'', ''getClass'', ''getFile'', ''getFloat'', ''getInt'', ''getLong'', ''getStrings'' -- return a typed value of the ''name'' property (i.e., number, file name, class name, ...). 
 +    * ''set(String name, String value)'' -- set the value of the ''name'' property to ''value''
 +    * ''setBoolean'', ''setClass'', ''setFile'', ''setFloat'', ''setInt'', ''setLong'', ''setStrings'' -- set the typed value of the ''name'' property (i.e., number, file name, class name, ...). 
 +  * in a mapper or a reducer, the ''context'' object also provides the ''getConfiguration()'' method, so the job properties can be accessed in the mappers and reducers too. 
 + 
 +Apart from already mentioned [[.:step-9#a-brief-list-of-hadoop-options|brief list of Hadoop properties]], there is one important Java-specific property: 
 +  * **''mapred.child.java.opts''** with default value **''-Xmx200m''**. This property sets the Java options used for every task attempt (mapper, reducer, combiner, partitioner). The default value ''-Xmx200m'' specifies the maximum size of memory allocation pool. If your mappers and reducers need 1GB memory, use ''-Xmx1024m''. Other Java options can be found in ''man java''
 + 
 + 
 +----
  
-There is one +<html> 
 +<table style="width:100%"> 
 +<tr> 
 +<td style="text-align:left; width: 33%; "></html>[[step-25|Step 25]]: Reducers, combiners and partitioners.<html></td> 
 +<td style="text-align:center; width: 33%; "></html>[[.|Overview]]<html></td> 
 +<td style="text-align:right; width: 33%; "></html>[[step-27|Step 27]]: Running multiple Hadoop jobs in one source file.<html></td> 
 +</tr> 
 +</table> 
 +</html>

[ Back to the navigation ] [ Back to the content ]