Differences

This shows you the differences between two versions of the page.

--- slurm [2023/01/19 15:44]
vodrazka [gpu-ms]
+++ slurm [2023/04/05 13:39]
vodrazka [gpu-ms]
@@ Line 2: / Line 2: @@
 LRC (Linguistic Research Cluster) is the name of ÚFAL's computational grid/cluster. The cluster is built on top of [[https://slurm.schedmd.com/|SLURM]] and is using [[https://www.lustre.org/|Lustre]] for [[internal:linux-network#directory-structure|data storage]].
+See Milan Straka's intro to Slurm (and Spark if you want):
+  * https://lectures.ms.mff.cuni.cz/video/rec/npfl118/2223/npfl118-2223-winter-slurm.mp4
+  * https://lectures.ms.mff.cuni.cz/video/rec/npfl118/2223/npfl118-2223-winter-spark.mp4
 Currently there are following partitions (queues) available for computing:
@@ Line 33: / Line 38: @@
 | dll-3gpu[1-5] | 64 | 2:16:2 | 128642 | gpuram48G gpu_cc8.6 | NVIDIA A40 |
 | dll-4gpu[1,2] | 40 | 2:10:2 | 187978 | gpuram24G gpu_cc8.6 | NVIDIA RTX 3090 |
+| dll-4gpu3 | 62 | 1:32:2 | 515652 | gpuram48G gpu_cc8.9 | NVIDIA L40 |
+| dll-4gpu4 | 30 | 1:16:2 | 257616 | gpuram48G gpu_cc8.6 | NVIDIA A40 |
 | dll-8gpu[1,2] | 64 | 2:16:2 | 515838 | gpuram24G gpu_cc8.0 | NVIDIA A30 |
 | dll-8gpu[3,4] | 32 | 2:8:2 | 257830 | gpuram16G gpu_cc8.6 | NVIDIA RTX A4000 |
@@ Line 99: / Line 106: @@
 <code>
 man sbatch
+</code>
+=== Rudolf's template ===
+The main point is for log files to have the job name and job id in them automatically.
+<code>
+#SBATCH -J RuRjob
+#SBATCH -o %x.%j.out
+#SBATCH -e %x.%j.err
+#SBATCH -p gpu-troja
+#SBATCH --gres=gpu:1
+#SBATCH --mem=16G
+#SBATCH --constraint="gpuram16G|gpuram24G"
+# Print each command to STDERR before executing (expanded), prefixed by "+ "
+set -o xtrace
 </code>
@@ Line 220: / Line 244: @@
 <code>man srun</code>
+==== Basic commands on cluster machines ====
+  lspci
+    # is any such hardware there?
+  nvidia-smi
+    # more details, incl. running processes on the GPU
+    # nvidia-* are typically located in /usr/bin
+  watch nvidia-smi
+    # For monitoring GPU activity in a separate terminal (thanks to Jindrich Libovicky for this!)
+    # You can also use nvidia-smi -l TIME
+  nvcc --version
+    # this should tell CUDA version
+    # nvcc is typically installed in /usr/local/cuda/bin/
+  theano-test
+    # dela to vubec neco uzitecneho? :-)
+    # theano-* are typically located in /usr/local/bin/
+  /usr/local/cuda/samples/1_Utilities/deviceQuery/deviceQuery
+    # shows CUDA capability etc.
+  ssh dll1; ~popel/bin/gpu_allocations
+    # who occupies which card on a given machine
 ===== See also =====

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences