Differences

This shows you the differences between two versions of the page.

--- spark:running-spark-on-single-machine-or-on-cluster [2022/12/14 12:49]
straka [Starting Spark Cluster]
+++ spark:running-spark-on-single-machine-or-on-cluster [2022/12/14 12:53]
straka [Memory Specification]
@@ Line 27: / Line 27: @@
 ==== Memory Specification ====
-Memory specification used for master and worker heap size (and for ''mem_free'' SGE constraint) must be specified. The memory can be specified either in bytes, or using ''kK/mM/gG'' suffix. A reasonable default value is 512M or 1G.
+TL;DR: Good default is ''2G'.
+The memory for each worker is specified using the following format: <file>spark_memory_per_workerG[:memory_per_Python_processG]</file>
+The Spark memory limits the Java heap, and half of it is reserved for memory storage of cached RDDs. The second value sets a memory limit of every Python process and is by default set to ''2G''.
 ==== Examples ====

Institute of Formal and Applied Linguistics Wiki