Differences

This shows you the differences between two versions of the page.

--- courses:mapreduce-tutorial:step-7 [2012/01/24 19:05]
straka vytvořeno
+++ courses:mapreduce-tutorial:step-7 [2012/01/25 22:01]
straka
@@ Line 1: / Line 1: @@
-====== MapReduce Tutorial : ======
+====== MapReduce Tutorial : Dynamic Hadoop cluster for several computations ======
+When multiple Hadoop jobs should be executed, it is better to reuse the cluster instead of allocating a new one for every computation.
+A cluster can be created using
+  /home/straka/hadoop/bin/hadoop-cluster -c number_of_machines -w sec_to_run_the_cluster_for
+The syntax is the same as in ''perl script.pl run''.
+The associated SGE job name is HadoopCluster. The running job can be stopped by either removing ''HadoopCluster.c$SGE_JOBID'' file or deleting the SGE job using ''qdel''.
+===== Using a running cluster =====
+Running cluster is identified by its master. When running a Hadoop job using Perl API, existing cluster can be used by
+  perl script.pl run -jt cluster_master:9001 ...
+===== Example =====
+Try running the same script {{:courses:mapreduce-tutorial:step-6.txt|wordcount.pl}} as in the last step, this time by creating the cluster and submitting the job to it:
+  /home/straka/hadoop/bin/hadoop-cluster -c 1 -w 600
+  perl wordcount.pl run -jt cluster_master:9001 -Dmapred.max.split.size=1000000 /home/straka/wiki/cs-text-medium some_output_directory

Institute of Formal and Applied Linguistics Wiki