Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
|
courses:mapreduce-tutorial:step-7 [2012/01/24 19:05] straka vytvořeno |
courses:mapreduce-tutorial:step-7 [2013/02/08 14:36] (current) popel Milan improved our Hadoop |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== MapReduce Tutorial : ====== | + | ====== MapReduce Tutorial : Dynamic Hadoop cluster for several computations |
| + | |||
| + | When multiple Hadoop jobs should be executed, it is better to reuse the cluster instead of allocating a new one for every computation. | ||
| + | |||
| + | A cluster can be created using | ||
| + | / | ||
| + | The syntax is the same as in '' | ||
| + | |||
| + | The associated SGE job name is HadoopCluster. The running job can be stopped by either removing '' | ||
| + | |||
| + | ===== Using a running cluster ===== | ||
| + | Running cluster is identified by its master. When running a Hadoop job using Perl API, existing cluster can be used by | ||
| + | perl script.pl -jt cluster_master: | ||
| + | |||
| + | ===== Running Hadoop jobs from now on ===== | ||
| + | |||
| + | From now on, it is best to run MR jobs using a one-machine cluster -- create a one-machine cluster using '' | ||
| + | |||
| + | ===== Example ===== | ||
| + | |||
| + | Try running the same script {{: | ||
| + | wget --no-check-certificate ' | ||
| + | / | ||
| + | # NOW VIEW THE FILE | ||
| + | # $EDITOR step-7-wordcount.pl | ||
| + | rm -rf step-7-out-sol; | ||
| + | less less step-7-out-sol/ | ||
| + | Remarks: | ||
| + | * The reducers seem to start running before the mappers finish. In the web interface, the running time of reducers is divided into thirds: | ||
| + | * during the first 33%, the mapper outputs are copied to the machine where reducer runs. | ||
| + | * during the second 33%, the (key, value) pairs are sorted. | ||
| + | * during the last 33%, the user-defined reducer runs. | ||
| + | |||
| + | ---- | ||
| + | |||
| + | < | ||
| + | <table style=" | ||
| + | < | ||
| + | <td style=" | ||
| + | <td style=" | ||
| + | <td style=" | ||
| + | </ | ||
| + | </ | ||
| + | </ | ||
