This is an old revision of the document!
MapReduce Tutorial : Managing a Hadoop cluster
Hadoop clusters can be created and stopped dynamically, using the SGE cluster. A Hadoop cluster consists of one jobtracker (master of the cluster) and multiple tasktrackers. The cluster is identified by its jobtracker. The jobtracker listens on two ports – one is used to submit jobs and the other is a web interface.
A Hadoop cluster can be created:
- for a specific Hadoop job. This is done by executing the job with the
-c
option, see Running jobs. - manually using
/net/projects/hadoop/bin/hadoop-cluster
script:
/net/projects/hadoop/bin/hadoop-cluster -c number_of_machines -w seconds_until_cluster_terminates
When a Hadoop cluster starts, it submits a job to SGE cluster. The job creates 3 files in the current directory:
HadoopCluster.c$SGE_JOBID
– high-level status of the Hadoop computationHadoopCluster.o$SGE_JOBID
– contains stdout and stderr of the Hadoop jobHadoopCluster.po$SGE_JOBID
– contains stdout and stderr of the Hadoop cluster
A Hadoop cluster is stopped:
- after the timeout specified by
-w
- when the
HadoopCluster.c$SGE_JOBID
file is deleted - using
qdel
.