Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
spark:running-spark-on-single-machine-or-on-cluster [2022/12/14 12:49] straka [Starting Spark Cluster] |
spark:running-spark-on-single-machine-or-on-cluster [2023/11/07 12:48] (current) straka [Starting Spark Cluster] |
||
---|---|---|---|
Line 13: | Line 13: | ||
Spark computations can be started both on desktop machines and on cluster machines, either by specifying '' | Spark computations can be started both on desktop machines and on cluster machines, either by specifying '' | ||
- | Note that when you use '' | + | Note that when you use '' |
===== Starting Spark Cluster | ===== Starting Spark Cluster | ||
Line 20: | Line 20: | ||
The Spark cluster can be started using one of the following two commands: | The Spark cluster can be started using one of the following two commands: | ||
- | * '' | + | * '' |
* '' | * '' | ||
Line 27: | Line 27: | ||
==== Memory Specification ==== | ==== Memory Specification ==== | ||
- | Memory specification used for master and worker heap size (and for '' | + | TL;DR: Good default is '' |
+ | The memory for each worker is specified using the following format: < | ||
+ | |||
+ | The Spark memory limits the Java heap, and half of it is reserved for memory storage of cached RDDs. The second value sets a memory limit of every Python process and is by default set to '' | ||
==== Examples ==== | ==== Examples ==== | ||
- | Start Spark cluster with 10 machines 1GB RAM each and then run interactive shell. The cluster stops after the shell is exited. | + | Start Spark cluster with 10 workers 2GB RAM each and then run interactive shell. The cluster stops after the shell is exited. |
- | < | + | < |
- | + | ||
- | Start Spark cluster with 20 machines 512MB RAM each. The cluster has to be stopped manually using '' | + | |
- | < | + | |
- | + | ||
- | Note that a running Spark cluster can currently be used only from other cluster machines (connections to a running SGE Spark cluster from my workstation ends with timeout). | + | |
- | + | ||
- | ==== Additional SGE Options ==== | + | |
- | Additional | + | Start Spark cluster with 20 workers 4GB RAM each in the '' |
- | < | + | < |
[ Back to the navigation ] [ Back to the content ]