Differences

This shows you the differences between two versions of the page.

--- spark:running-spark-on-single-machine-or-on-cluster [2022/12/14 12:49]
straka [Starting Spark Cluster]
+++ spark:running-spark-on-single-machine-or-on-cluster [2022/12/14 12:53]
straka [Memory Specification]
@@ Line 23: / Line 23: @@
   * ''spark-srun'': start a Spark cluster via an ''srun'' <file>spark-srun [salloc args] workers memory_per_workerG[:python_memoryG] [command arguments...]</file>
-Both ''spark-sbatch'' and ''spark-srun'' commands start a Spark cluster with the specified number of workers, each with the given amount of memory. Then they set ''MASTER'' and ''SPARK_ADDRESS'' to the address of the Spark master and ''SPARK_WEBUI'' to the URL of the master web interface. Both these values are also written on standard output and added to the Slurm job Comment. Finally, the specified command is started; when ''spark-srun'' is used, the command may be empty, in which case ''bash'' is opened.
+Both ''spark-sbatch'' and ''spark-srun'' commands start a Spark cluster with the specified number of workers, each with the given amount of memory. Then they set ''MASTER'' and ''SPARK_ADDRESS'' to the address of the Spark master and ''SPARK_WEBUI'' to the URL of the master web interface. Both these values are also written on standard output, and the ''SPARK_WEBUI'' is added to the Slurm job Comment. Finally, the specified command is started; when ''spark-srun'' is used, the command may be empty, in which case ''bash'' is opened.
 ==== Memory Specification ====
-Memory specification used for master and worker heap size (and for ''mem_free'' SGE constraint) must be specified. The memory can be specified either in bytes, or using ''kK/mM/gG'' suffix. A reasonable default value is 512M or 1G.
+TL;DR: Good default is ''2G'.
+The memory for each worker is specified using the following format: <file>spark_memory_per_workerG[:memory_per_Python_processG]</file>
+The Spark memory limits the Java heap, and half of it is reserved for memory storage of cached RDDs. The second value sets a memory limit of every Python process and is by default set to ''2G''.
 ==== Examples ====

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences