Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
spark:running-spark-on-single-machine-or-on-cluster [2022/12/14 12:54] straka [Memory Specification] |
spark:running-spark-on-single-machine-or-on-cluster [2022/12/14 12:57] straka [Examples] |
||
---|---|---|---|
Line 27: | Line 27: | ||
==== Memory Specification ==== | ==== Memory Specification ==== | ||
- | TL;DR: Good default is '' | + | TL;DR: Good default is '' |
- | The memory for each worker is specified using the following format: | + | The memory for each worker is specified using the following format: < |
- | < | + | |
The Spark memory limits the Java heap, and half of it is reserved for memory storage of cached RDDs. The second value sets a memory limit of every Python process and is by default set to '' | The Spark memory limits the Java heap, and half of it is reserved for memory storage of cached RDDs. The second value sets a memory limit of every Python process and is by default set to '' | ||
Line 36: | Line 35: | ||
==== Examples ==== | ==== Examples ==== | ||
- | Start Spark cluster with 10 machines 1GB RAM each and then run interactive shell. The cluster stops after the shell is exited. | + | Start Spark cluster with 10 workers 2GB RAM each and then run interactive shell. The cluster stops after the shell is exited. |
- | < | + | < |
- | Start Spark cluster with 20 machines 512MB RAM each. The cluster has to be stopped manually | + | Start Spark cluster with 20 workers 4GB RAM each, and run '' |
- | < | + | < |
- | Note that a running Spark cluster can currently be used only from other cluster machines (connections to a running SGE Spark cluster from my workstation ends with timeout). | ||
==== Additional SGE Options ==== | ==== Additional SGE Options ==== |