This is an old revision of the document!

MapReduce Tutorial : Managing a Hadoop cluster

Hadoop clusters can be created and stopped dynamically, using the SGE cluster. A Hadoop cluster consists of one jobtracker (master of the cluster) and multiple tasktrackers. The cluster is identified by its jobtracker. The jobtracker listens on two ports – one is used to submit jobs and the other is a web interface.

A Hadoop cluster can be created:

for a specific Hadoop job. This is done by executing the job with the -c option, see Running jobs.
manually using /net/projects/hadoop/bin/hadoop-cluster script:

/net/projects/hadoop/bin/hadoop-cluster -c number_of_machines -w seconds_until_cluster_terminates

When a Hadoop cluster starts, it submits a job to SGE cluster. The job creates 3 files in the current directory:

HadoopCluster.c$SGE_JOBID – high-level status of the Hadoop computation
HadoopCluster.o$SGE_JOBID – contains stdout and stderr of the Hadoop job
HadoopCluster.po$SGE_JOBID – contains stdout and stderr of the Hadoop cluster

A Hadoop cluster is stopped:

after the timeout specified by -w
when the HadoopCluster.c$SGE_JOBID file is deleted
using qdel.

Controlling the cluster

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

MapReduce Tutorial : Managing a Hadoop cluster

Controlling the cluster