This is an old revision of the document!

MapReduce Tutorial : Dynamic Hadoop cluster for several computations

When multiple MR jobs should be executed, it would be better to reuse the cluster instead of allocating a new one for every computation.

A cluster can be created using

/home/straka/hadoop/bin/hadoop-cluster -c number_of_machines -w sec_to_run_the_cluster_for

The syntax is the same as in perl script.pl run.

The associated SGE job name is HadoopCluster. The running job can be stopped by either removing HadoopCluster.c$SGE_JOBID file or deleting the SGE job using qdel.

Using a running cluster

Running cluster is identified by its master. When running a Perl MR job, existing cluster can be used by

perl script.pl run -jt hostname_of_cluster_master:9001 ...

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

MapReduce Tutorial : Dynamic Hadoop cluster for several computations

Using a running cluster