| Both sides previous revision
Previous revision
Next revision | Previous revision | 
                        
                | spark:running-spark-on-single-machine-or-on-cluster [2022/12/14 12:58] straka [Additional SGE Options]
 | spark:running-spark-on-single-machine-or-on-cluster [2023/11/07 12:48] (current) straka [Starting Spark Cluster]
 | 
        
| Spark computations can be started both on desktop machines and on cluster machines, either by specifying ''MASTER'' to one of ''local'' modes, or by not specifying MASTER at all (''local[*]'' is used then). | Spark computations can be started both on desktop machines and on cluster machines, either by specifying ''MASTER'' to one of ''local'' modes, or by not specifying MASTER at all (''local[*]'' is used then). | 
|  |  | 
| Note that when you use ''qrsh'' or ''qsub'', your job is usually expected to use one core, so you should specify ''MASTER=local''. If you do not, Spark will use all cores on the machine, even though SGE gave you only one. | Note that when you use ''sbatch'' or ''srun'' to run a cluster locally, your job is by default expected to use just a single core, so you should specify ''MASTER=local''. If you do not, Spark will use all cores on the machine, even though Slurm gave you only one. | 
|  |  | 
| ===== Starting Spark Cluster  ===== | ===== Starting Spark Cluster  ===== | 
|  |  | 
| The Spark cluster can be started using one of the following two commands: | The Spark cluster can be started using one of the following two commands: | 
| * ''spark-sbatch'': start a Spark cluster via an ''sbatch'' <file>spark-srun [sbatch args] workers memory_per_workerG[:python_memoryG] command [arguments...]</file> | * ''spark-sbatch'': start a Spark cluster via an ''sbatch'' <file>spark-sbatch [sbatch args] workers memory_per_workerG[:python_memoryG] command [arguments...]</file> | 
| * ''spark-srun'': start a Spark cluster via an ''srun'' <file>spark-srun [salloc args] workers memory_per_workerG[:python_memoryG] [command arguments...]</file> | * ''spark-srun'': start a Spark cluster via an ''srun'' <file>spark-srun [salloc args] workers memory_per_workerG[:python_memoryG] [command arguments...]</file> | 
|  |  | 
| Start Spark cluster with 20 workers 4GB RAM each in the ''cpu-ms'' partition, and run ''screen'' in it, so that several computations can be performed using this cluster. The cluster has to be stopped manually (either by quitting the scree or calling ''scancel''). | Start Spark cluster with 20 workers 4GB RAM each in the ''cpu-ms'' partition, and run ''screen'' in it, so that several computations can be performed using this cluster. The cluster has to be stopped manually (either by quitting the scree or calling ''scancel''). | 
| <file>spark-sbatch -p cpu-ms 20 4G screen -D -m</file> | <file>spark-sbatch -p cpu-ms 20 4G screen -D -m</file> | 
|  |  | 
|  |  | 
|  |  |