The output files can be compressed using
FileOutputFormat.setCompressOutput(job, true);
The default compression format is deflate
– raw Zlib compression. Several other compression formats can be selected:
import org.apache.hadoop.io.compress.*; ... FileOutputFormat.setOutputCompressorClass(GzipCodec.class); //.gz FileOutputFormat.setOutputCompressorClass(BZip2Codec.class); //.bz2
Of course, any of these formats is decompressed transparently when the file is being read.
The job properties can be set:
-Dname=value
. See the syntax of the hadoop script..getConfiguration()
a Configuration object is retrieved. It provides following methods:String get(String name)
– get the value of the name
property, null
if it does not exist.String get(String name, String defaultValue)
– get the value of the name
propertygetBoolean
, getClass
, getFile
, getFloat
, getInt
, getLong
, getStrings
– return a typed value of the name
property (i.e., number, file name, class name, …).set(String name, String value)
– set the value of the name
property to value
.setBoolean
, setClass
, setFile
, setFloat
, setInt
, setLong
, setStrings
– set the typed value of the name
property (i.e., number, file name, class name, …).context
object also provides the getConfiguration()
method, so the job properties can be accessed in the mappers and reducers too.Apart from already mentioned brief list of Hadoop properties, there is one important Java-specific property:
mapred.child.java.opts
with default value -Xmx200m
. This property sets the Java options used for every task attempt (mapper, reducer, combiner, partitioner). The default value -Xmx200m
specifies the maximum size of memory allocation pool. If your mappers and reducers need 1GB memory, use -Xmx1024m
. Other Java options can be found in man java
.
Step 25: Reducers, combiners and partitioners. | Overview | Step 27: Running multiple Hadoop jobs in one source file. |