Both sides previous revision
Previous revision
|
Next revision
Both sides next revision
|
slurm [2023/01/16 17:18] popel [Batch mode] |
slurm [2023/01/16 20:55] popel --nodelist problems |
#SBATCH -D /some/path/ # change directory before executing the job | #SBATCH -D /some/path/ # change directory before executing the job |
#SBATCH -N 2 # number of nodes (default 1) | #SBATCH -N 2 # number of nodes (default 1) |
#SBATCH --nodelist=node1,node2... # required node, or comma separated list of required nodes | #SBATCH --nodelist=node1,node2... # execute on *all* the specified nodes (and possibly more) |
#SBATCH --cpus-per-task=4 # number of cores/threads per task (default 1) | #SBATCH --cpus-per-task=4 # number of cores/threads per task (default 1) |
#SBATCH --gres=gpu:1 # number of GPUs to request (default 0) | #SBATCH --gres=gpu:1 # number of GPUs to request (default 0) |
| |
* ''-p cpu-troja'' explicitly requires partition ''cpu-troja''. If not specified slurm will use default partition. | * ''-p cpu-troja'' explicitly requires partition ''cpu-troja''. If not specified slurm will use default partition. |
* ''--mem=64G'' requires 64G of memory for the job | * ''-''''-mem=64G'' requires 64G of memory for the job |
| |
**To get interactive job with a single GPU of any kind:** | **To get interactive job with a single GPU of any kind:** |
<code>srun -p gpu-troja,gpu-ms --gres=gpu:1 --pty bash</code> | <code>srun -p gpu-troja,gpu-ms --gres=gpu:1 --pty bash</code> |
* ''-p gpu-troja,gpu-ms'' require only nodes from these two partitions | * ''-p gpu-troja,gpu-ms'' require only nodes from these two partitions |
* ''--gres=gpu:1'' requires 1 GPUs | * ''-''''-gres=gpu:1'' requires 1 GPUs |
| |
<code>srun -p gpu-troja,gpu-ms --nodelist=tdll-3gpu1 --mem=64G --gres=gpu:2 --pty bash</code> | <code>srun -p gpu-troja,gpu-ms --nodelist=tdll-3gpu1 --mem=64G --gres=gpu:2 --pty bash</code> |
* ''-p gpu-troja,gpu-ms'' require only nodes from these two partitions | * ''-p gpu-troja,gpu-ms'' require only nodes from these two partitions |
* ''--nodelist=tdll-3gpu1'' explicitly requires one specific node | * ''-''''-nodelist=tdll-3gpu1'' explicitly requires one specific node |
* ''--gres=gpu:2'' requires 2 GPUs | * Note that e.g. ''-''''-nodelist=tdll-3gpu[1-4]'' would execute 4 jobs on **all** the four machines ''tdll-3gpu[1-4]''. The documentation says "The job will contain all of these hosts and possibly additional hosts as needed to satisfy resource requirements." I am not aware of any [[https://stackoverflow.com/a/37555321/3310232|simple way]] how to specify that **any** of the listed nodes can be used, i.e. an equivalent of SGE ''-q '*@hector[14]'''. |
| * ''-''''-gres=gpu:2'' requires 2 GPUs |
| |
<code>srun -p gpu-troja --constraint="gpuram48G|gpuram40G" --mem=64G --gres=gpu:2 --pty bash</code> | <code>srun -p gpu-troja --constraint="gpuram48G|gpuram40G" --mem=64G --gres=gpu:2 --pty bash</code> |
* ''--constraint="gpuram48G|gpuram40G"'' only consider nodes that have either ''gpuram48G'' or ''gpuram40G'' feature defined | * ''-''''-constraint="gpuram48G|gpuram40G"'' only consider nodes that have either ''gpuram48G'' or ''gpuram40G'' feature defined |
| |
==== Delete Job ==== | ==== Delete Job ==== |