Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
spark:running-spark-on-single-machine-or-on-cluster [2022/12/14 12:49] straka [Starting Spark Cluster] |
spark:running-spark-on-single-machine-or-on-cluster [2022/12/14 12:58] straka [Additional SGE Options] |
||
---|---|---|---|
Line 27: | Line 27: | ||
==== Memory Specification ==== | ==== Memory Specification ==== | ||
- | Memory specification used for master and worker heap size (and for '' | + | TL;DR: Good default is '' |
+ | The memory for each worker is specified using the following format: < | ||
- | ==== Examples ==== | + | The Spark memory limits the Java heap, and half of it is reserved for memory storage of cached RDDs. The second value sets a memory limit of every Python process and is by default set to '' |
- | Start Spark cluster with 10 machines 1GB RAM each and then run interactive shell. The cluster stops after the shell is exited. | + | ==== Examples ==== |
- | < | + | |
- | Start Spark cluster with 20 machines 512MB RAM each. The cluster | + | Start Spark cluster with 10 workers 2GB RAM each and then run interactive shell. The cluster |
- | < | + | < |
- | Note that a running | + | Start Spark cluster |
+ | < | ||
- | ==== Additional SGE Options ==== | ||
- | Additional '' | ||
- | < | ||