Differences

This shows you the differences between two versions of the page.

--- slurm [2022/08/31 11:37]
vodrazka [Basic usage]
+++ slurm [2022/09/08 13:25]
vodrazka [Cluster info]
@@ Line 27: / Line 27: @@
 After submitting this simple code you should end up with the two files (''helloWorld.out'' and ''helloWorld.err'') in the directory where you called the ''sbatch'' command.
+Here is the list of other useful ''SBATCH'' directives:
+<code>
+#SBATCH -D /some/path/                        # change directory before executing the job
+#SBATCH -N 2                                  # number of nodes (default 1)
+#SBATCH --nodelist=node1,node2...             # required node, or comma separated list of required nodes
+#SBATCH -c 4                                  # number of cores/threads per task (default 1)
+#SBATCH --gres=gpu:1                          # number of GPUs to request (default 0)
+#SBATCH --mem=10G                             # request 10 gigabytes memory (per node, default depends on node)
+</code>
+If you need you can have slurm report to you:
+<code>
+#SBATCH --mail-type=begin        # send email when job begins
+#SBATCH --mail-type=end          # send email when job ends
+#SBATCH --mail-type=fail         # send email if job fails
+#SBATCH --mail-user=<YourUFALEmailAccount>
+</code>
+As usuall the complete set of options can be found by typing:
+<code>
+man sbatch
+</code>
+==== Running jobs ====
+In order to inspect all running jobs on the cluster use:
+<code>
+squeue
+</code>
+filter only jobs of user ''linguist'':
+<code>
+squeue -u linguist
+</code>
+filter only jobs on partition ''gpu-ms'':
+<code>
+squeue -p gpu-ms
+</code>
+filter jobs in specific state (see ''man squeue'' for list of valid job states):
+<code>
+squeue -t RUNNING
+</code>
+filter jobs running on a specific node:
+<code>
+squeue -w dll-3gpu1
+</code>
+==== Cluster info ====
+The command ''sinfo'' can give you useful information about nodes available in the cluster. Here is a short list of some examples:
+List available partitions(queues). The default partition is marked with ''*'':
+<code>
+sinfo
+</code>
+List detailed info about nodes:
+<code>
+sinfo -l -N
+</code>
+List nodes with some custom format info:
+<code>
+sinfo -N -o "%N %P %.11T %.15f"
+</code>
@@ Line 40: / Line 118: @@
 <code>srun -p cpu-troja --mem=64G --pty bash</code>
-Where:
   * ''-p cpu-troja'' explicitly requires partition ''cpu-troja''
   * ''--mem=64G'' requires 64G of memory for the job
+<code>srun -p gpu-troja --nodelist=tdll-3gpu1 --mem=64G --gres=gpu:2 --pty bash</code>
+  * ''--nodelist=tdll-3gpu1'' explicitly requires one specific node
+  * ''--gres=gpu:2'' requires 2 GPUs
 To see all the available options type:

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences