Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
spark:running-spark-on-single-machine-or-on-cluster [2022/12/14 12:49] straka [Starting Spark Cluster] |
spark:running-spark-on-single-machine-or-on-cluster [2023/11/07 12:48] (current) straka [Starting Spark Cluster] |
||
|---|---|---|---|
| Line 13: | Line 13: | ||
| Spark computations can be started both on desktop machines and on cluster machines, either by specifying '' | Spark computations can be started both on desktop machines and on cluster machines, either by specifying '' | ||
| - | Note that when you use '' | + | Note that when you use '' |
| ===== Starting Spark Cluster | ===== Starting Spark Cluster | ||
| Line 20: | Line 20: | ||
| The Spark cluster can be started using one of the following two commands: | The Spark cluster can be started using one of the following two commands: | ||
| - | * '' | + | * '' |
| * '' | * '' | ||
| Line 27: | Line 27: | ||
| ==== Memory Specification ==== | ==== Memory Specification ==== | ||
| - | Memory specification used for master and worker heap size (and for '' | + | TL;DR: Good default is '' |
| + | The memory for each worker is specified using the following format: < | ||
| + | |||
| + | The Spark memory limits the Java heap, and half of it is reserved for memory storage of cached RDDs. The second value sets a memory limit of every Python process and is by default set to '' | ||
| ==== Examples ==== | ==== Examples ==== | ||
| - | Start Spark cluster with 10 machines 1GB RAM each and then run interactive shell. The cluster stops after the shell is exited. | + | Start Spark cluster with 10 workers 2GB RAM each and then run interactive shell. The cluster stops after the shell is exited. |
| - | < | + | < |
| - | + | ||
| - | Start Spark cluster with 20 machines 512MB RAM each. The cluster has to be stopped manually using '' | + | |
| - | < | + | |
| - | + | ||
| - | Note that a running Spark cluster can currently be used only from other cluster machines (connections to a running SGE Spark cluster from my workstation ends with timeout). | + | |
| - | + | ||
| - | ==== Additional SGE Options ==== | + | |
| - | Additional | + | Start Spark cluster with 20 workers 4GB RAM each in the '' |
| - | < | + | < |
