[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
slurm [2022/09/08 12:48]
vodrazka [Interactive mode]
slurm [2022/10/25 15:26]
vodrazka [Submit nodes]
Line 1: Line 1:
 ====== ÚFAL Grid Engine (LRC) ====== ====== ÚFAL Grid Engine (LRC) ======
  
-LRC (Linguistic Research Cluster) is name of ÚFAL's computational grid/cluster.+LRC (Linguistic Research Cluster) is the name of ÚFAL's computational grid/cluster. The cluster is built on top of [[https://slurm.schedmd.com/|SLURM]] and is using [[https://www.lustre.org/|Lustre]] for [[internal:linux-network#directory-structure|data storage]].
  
 +Currently there are following partitions (queues) available for computing:
 +
 +===== Node list by partitions =====
 +
 +==== cpu-troja ====
 +
 +| Node name | Thread count | Socket:Core:Thread | RAM (MB) |
 +| achilles1 | 32 | 2:8:2 | 128810 |
 +| achilles2 | 32 | 2:8:2 | 128810 |
 +| achilles3 | 32 | 2:8:2 | 128810 |
 +| achilles4 | 32 | 2:8:2 | 128810 |
 +| achilles5 | 32 | 2:8:2 | 128810 |
 +| achilles6 | 32 | 2:8:2 | 128810 |
 +| achilles7 | 32 | 2:8:2 | 128810 |
 +| achilles8 | 32 | 2:8:2 | 128810 |
 +| hector1 | 32 | 2:8:2 | 128810 |
 +| hector2 | 32 | 2:8:2 | 128810 |
 +| hector3 | 32 | 2:8:2 | 128810 |
 +| hector4 | 32 | 2:8:2 | 128810 |
 +| hector5 | 32 | 2:8:2 | 128810 |
 +| hector6 | 32 | 2:8:2 | 128810 |
 +| hector7 | 32 | 2:8:2 | 128810 |
 +| hector8 | 32 | 2:8:2 | 128810 |
 +| helena1 | 32 | 2:8:2 | 128811 |
 +| helena2 | 32 | 2:8:2 | 128811 |
 +| helena3 | 32 | 2:8:2 | 128811 |
 +| helena4 | 32 | 2:8:2 | 128811 |
 +| helena5 | 32 | 2:8:2 | 128810 |
 +| helena6 | 32 | 2:8:2 | 128811 |
 +| helena7 | 32 | 2:8:2 | 128810 |
 +| helena8 | 32 | 2:8:2 | 128811 |
 +| paris1 | 32 | 2:8:2 | 128810 |
 +| paris2 | 32 | 2:8:2 | 128810 |
 +| paris3 | 32 | 2:8:2 | 128810 |
 +| paris4 | 32 | 2:8:2 | 128810 |
 +| paris5 | 32 | 2:8:2 | 128810 |
 +| paris6 | 32 | 2:8:2 | 128810 |
 +| paris7 | 32 | 2:8:2 | 128810 |
 +| paris8 | 32 | 2:8:2 | 128810 |
 +| hyperion2 | 64 | 2:16:2 | 257667 |
 +| hyperion3 | 64 | 2:16:2 | 257667 |
 +| hyperion4 | 64 | 2:16:2 | 257667 |
 +| hyperion5 | 64 | 2:16:2 | 257667 |
 +| hyperion6 | 64 | 2:16:2 | 257667 |
 +| hyperion7 | 64 | 2:16:2 | 257667 |
 +| hyperion8 | 64 | 2:16:2 | 257667 |
 +==== cpu-ms ====
 +
 +| Node name | Thread count | Socket:Core:Thread | RAM (MB) |
 +| iridium | 16 | 2:4:2 | 515977 |
 +| orion1 | 40 | 2:10:2 | 128799 |
 +| orion2 | 40 | 2:10:2 | 128799 |
 +| orion3 | 40 | 2:10:2 | 128799 |
 +| orion4 | 40 | 2:10:2 | 128799 |
 +| orion5 | 40 | 2:10:2 | 128799 |
 +| orion6 | 40 | 2:10:2 | 128799 |
 +| orion7 | 40 | 2:10:2 | 128799 |
 +| orion8 | 40 | 2:10:2 | 128799 |
 +==== gpu-troja ====
 +
 +| Node name | Thread count | Socket:Core:Thread | RAM (MB) | Features |
 +| tdll-3gpu1 | 64 | 2:16:2 | 128642 | gpuram48G gpu_cc8.6 |
 +| tdll-3gpu2 | 64 | 2:16:2 | 128642 | gpuram48G gpu_cc8.6 |
 +| tdll-3gpu3 | 64 | 2:16:2 | 128642 | gpuram48G gpu_cc8.6 |
 +| tdll-3gpu4 | 64 | 2:16:2 | 128642 | gpuram48G gpu_cc8.6 |
 +| tdll-8gpu1 | 64 | 2:16:2 | 257666 | gpuram40G gpu_cc8.0 |
 +| tdll-8gpu2 | 64 | 2:16:2 | 257666 | gpuram40G gpu_cc8.0 |
 +| tdll-8gpu3 | 32 | 2:8:2 | 253725 | gpuram16G gpu_cc7.5 |
 +| tdll-8gpu4 | 32 | 2:8:2 | 253725 | gpuram16G gpu_cc7.5 |
 +| tdll-8gpu5 | 32 | 2:8:2 | 253725 | gpuram16G gpu_cc7.5 |
 +| tdll-8gpu6 | 32 | 2:8:2 | 253725 | gpuram16G gpu_cc7.5 |
 +| tdll-8gpu7 | 32 | 2:8:2 | 253725 | gpuram16G gpu_cc7.5 |
 +==== gpu-ms ====
 +
 +| Node name | Thread count | Socket:Core:Thread | RAM (MB) | Features |
 +| dll-3gpu1 | 64 | 2:16:2 | 128642 | gpuram48G gpu_cc8.6 |
 +| dll-3gpu2 | 64 | 2:16:2 | 128642 | gpuram48G gpu_cc8.6 |
 +| dll-3gpu3 | 64 | 2:16:2 | 128642 | gpuram48G gpu_cc8.6 |
 +| dll-3gpu4 | 64 | 2:16:2 | 128642 | gpuram48G gpu_cc8.6 |
 +| dll-3gpu5 | 64 | 2:16:2 | 128642 | gpuram48G gpu_cc8.6 |
 +| dll-4gpu1 | 40 | 2:10:2 | 187978 | gpuram24G gpu_cc8.6 |
 +| dll-4gpu2 | 40 | 2:10:2 | 187978 | gpuram24G gpu_cc8.6 |
 +| dll-8gpu1 | 64 | 2:16:2 | 515838 | gpuram24G gpu_cc8.0 |
 +| dll-8gpu2 | 64 | 2:16:2 | 515838 | gpuram24G gpu_cc8.0 |
 +| dll-8gpu3 | 32 | 2:8:2 | 257830 | gpuram16G gpu_cc8.6 |
 +| dll-8gpu4 | 32 | 2:8:2 | 253721 | gpuram16G gpu_cc8.6 |
 +| dll-8gpu5 | 40 | 2:10:2 | 385595 | gpuram16G gpu_cc7.5 |
 +| dll-8gpu6 | 40 | 2:10:2 | 385595 | gpuram16G gpu_cc7.5 |
 +| dll-10gpu1 | 32 | 2:8:2 | 257830 | gpuram16G gpu_cc8.6 |
 +| dll-10gpu2 | 32 | 2:8:2 | 257830 | gpuram11G gpu_cc6.1 |
 +| dll-10gpu3 | 32 | 2:8:2 | 257830 | gpuram11G gpu_cc6.1 |
 +
 +
 +==== Submit nodes ====
 +
 +
 +In order to submit a job you need to login to one of the head nodes:
 +
 +   lrc1.ufal.hide.ms.mff.cuni.cz
 +   lrc2.ufal.hide.ms.mff.cuni.cz
 +   sol1.ufal.hide.ms.mff.cuni.cz
 +   sol2.ufal.hide.ms.mff.cuni.cz
 +   sol3.ufal.hide.ms.mff.cuni.cz
 +   sol4.ufal.hide.ms.mff.cuni.cz
 ===== Basic usage ===== ===== Basic usage =====
  
Line 17: Line 121:
 #!/bin/bash #!/bin/bash
 #SBATCH -J helloWorld   # name of job #SBATCH -J helloWorld   # name of job
-#SBATCH -p cpu-troja   # name of partition or queue+#SBATCH -p cpu-troja   # name of partition or queue (if not specified default partition is used)
 #SBATCH -o helloWorld.out   # name of output file for this submission script #SBATCH -o helloWorld.out   # name of output file for this submission script
 #SBATCH -e helloWorld.err   # name of error file for this submission script #SBATCH -e helloWorld.err   # name of error file for this submission script
Line 33: Line 137:
 #SBATCH -N 2                                  # number of nodes (default 1) #SBATCH -N 2                                  # number of nodes (default 1)
 #SBATCH --nodelist=node1,node2...             # required node, or comma separated list of required nodes #SBATCH --nodelist=node1,node2...             # required node, or comma separated list of required nodes
-#SBATCH -                                 # number of cores/threads per task (default 1)+#SBATCH --cpus-per-task=                    # number of cores/threads per task (default 1)
 #SBATCH --gres=gpu:                         # number of GPUs to request (default 0) #SBATCH --gres=gpu:                         # number of GPUs to request (default 0)
 #SBATCH --mem=10G                             # request 10 gigabytes memory (per node, default depends on node) #SBATCH --mem=10G                             # request 10 gigabytes memory (per node, default depends on node)
Line 90: Line 194:
 <code> <code>
 sinfo sinfo
 +</code>
 +
 +List detailed info about nodes:
 +<code>
 +sinfo -l -N
 </code>  </code> 
  
-List types of available GPUs:+List nodes with some custom format info:
 <code> <code>
-sinfo -o %G+sinfo -N -o "%N %P %.11T %.15f"
 </code> </code>
  
 +=== CPU core allocation ===
  
 +The minimal computing resource in SLURM is one CPU core. However, CPU count advertised by SLURM corresponds to the number of CPU threads.
 +If you ask for 1 CPU core with <code>--cpus-per-task=1</code> SLURM will allocate all threads of 1 CPU core.
 +
 +For example ''dll-8gpu1'' will allocate 2 threads since its ThreadsPerCore=2:
 +
 +<code>
 +$> scontrol show node dll-8gpu1
 +$ scontrol show node dll-8gpu1
 +NodeName=dll-8gpu1 Arch=x86_64 CoresPerSocket=16 
 +   CPUAlloc=0 CPUTot=64 CPULoad=0.05                                               // CPUAlloc - allocated threads, CPUTot - total threads
 +   AvailableFeatures=gpuram24G
 +   ActiveFeatures=gpuram24G
 +   Gres=gpu:nvidia_a30:8(S:0-1)
 +   NodeAddr=10.10.24.63 NodeHostName=dll-8gpu1 Version=21.08.8-2
 +   OS=Linux 5.15.35-1-pve #1 SMP PVE 5.15.35-3 (Wed, 11 May 2022 07:57:51 +0200) 
 +   RealMemory=515838 AllocMem=0 FreeMem=507650 Sockets=2 Boards=1
 +   CoreSpecCount=1 CPUSpecList=62-63                                               // CoreSpecCount - cores reserved for OS, CPUSpecList - list of threads reserved for system
 +   State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/         // ThreadsPerCore - count of threads for 1 CPU core
 +   Partitions=gpu-ms 
 +   BootTime=2022-09-01T14:07:50 SlurmdStartTime=2022-09-02T13:54:05
 +   LastBusyTime=2022-10-02T20:17:09
 +   CfgTRES=cpu=64,mem=515838M,billing=64
 +   AllocTRES=
 +   CapWatts=n/a
 +   CurrentWatts=0 AveWatts=0
 +   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
 +</code>
 +
 +In the example above you can see comments at all lines relevant to CPU allocation.
  
  
Line 111: Line 250:
 There are many more parameters available to use. For example: There are many more parameters available to use. For example:
  
-<code>srun -p cpu-troja --mem=64G --pty bash</code>+**To get an interactive CPU job with 64GB of reserved memory:** 
 +<code>srun -p cpu-troja,cpu-ms --mem=64G --pty bash</code>
  
-  * ''-p cpu-troja'' explicitly requires partition ''cpu-troja''+  * ''-p cpu-troja'' explicitly requires partition ''cpu-troja''. If not specified slurm will use default partition.
   * ''--mem=64G'' requires 64G of memory for the job   * ''--mem=64G'' requires 64G of memory for the job
  
-<code>srun -p gpu-troja --nodelist=tdll-3gpu1 --mem=64G --gres=gpu:--pty bash</code>+**To get interactive job with a single GPU of any kind:** 
 +<code>srun -p gpu-troja,gpu-ms --gres=gpu:--pty bash</code> 
 +  * ''-p gpu-troja,gpu-ms'' require only nodes from these two partitions 
 +  * ''--gres=gpu:1'' requires 1 GPUs
  
 +<code>srun -p gpu-troja,gpu-ms --nodelist=tdll-3gpu1 --mem=64G --gres=gpu:2 --pty bash</code>
 +  * ''-p gpu-troja,gpu-ms'' require only nodes from these two partitions
   * ''--nodelist=tdll-3gpu1'' explicitly requires one specific node   * ''--nodelist=tdll-3gpu1'' explicitly requires one specific node
   * ''--gres=gpu:2'' requires 2 GPUs   * ''--gres=gpu:2'' requires 2 GPUs
 +
 +<code>srun -p gpu-troja --constraint="gpuram48G|gpuram40G" --mem=64G --gres=gpu:2 --pty bash</code>
 +  * ''--constraint="gpuram48G|gpuram40G"'' only consider nodes that have either ''gpuram48G'' or ''gpuram40G'' feature defined
  
 To see all the available options type: To see all the available options type:
  
 <code>man srun</code> <code>man srun</code>
 +
 +===== See also =====
 +
 +https://www.msi.umn.edu/slurm/pbs-conversion
 +
  

[ Back to the navigation ] [ Back to the content ]