[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
slurm [2023/01/19 15:44]
vodrazka [gpu-ms]
slurm [2023/04/05 13:39]
vodrazka [gpu-ms]
Line 2: Line 2:
  
 LRC (Linguistic Research Cluster) is the name of ÚFAL's computational grid/cluster. The cluster is built on top of [[https://slurm.schedmd.com/|SLURM]] and is using [[https://www.lustre.org/|Lustre]] for [[internal:linux-network#directory-structure|data storage]]. LRC (Linguistic Research Cluster) is the name of ÚFAL's computational grid/cluster. The cluster is built on top of [[https://slurm.schedmd.com/|SLURM]] and is using [[https://www.lustre.org/|Lustre]] for [[internal:linux-network#directory-structure|data storage]].
 +
 +See Milan Straka's intro to Slurm (and Spark if you want):
 +
 +  * https://lectures.ms.mff.cuni.cz/video/rec/npfl118/2223/npfl118-2223-winter-slurm.mp4
 +  * https://lectures.ms.mff.cuni.cz/video/rec/npfl118/2223/npfl118-2223-winter-spark.mp4
  
 Currently there are following partitions (queues) available for computing: Currently there are following partitions (queues) available for computing:
Line 33: Line 38:
 | dll-3gpu[1-5] | 64 | 2:16:2 | 128642 | gpuram48G gpu_cc8.6 | NVIDIA A40 | | dll-3gpu[1-5] | 64 | 2:16:2 | 128642 | gpuram48G gpu_cc8.6 | NVIDIA A40 |
 | dll-4gpu[1,2] | 40 | 2:10:2 | 187978 | gpuram24G gpu_cc8.6 | NVIDIA RTX 3090 | | dll-4gpu[1,2] | 40 | 2:10:2 | 187978 | gpuram24G gpu_cc8.6 | NVIDIA RTX 3090 |
 +| dll-4gpu3 | 62 | 1:32:2 | 515652 | gpuram48G gpu_cc8.9 | NVIDIA L40 |
 +| dll-4gpu4 | 30 | 1:16:2 | 257616 | gpuram48G gpu_cc8.6 | NVIDIA A40 |
 | dll-8gpu[1,2] | 64 | 2:16:2 | 515838 | gpuram24G gpu_cc8.0 | NVIDIA A30 | | dll-8gpu[1,2] | 64 | 2:16:2 | 515838 | gpuram24G gpu_cc8.0 | NVIDIA A30 |
 | dll-8gpu[3,4] | 32 | 2:8:2 | 257830 | gpuram16G gpu_cc8.6 | NVIDIA RTX A4000 | | dll-8gpu[3,4] | 32 | 2:8:2 | 257830 | gpuram16G gpu_cc8.6 | NVIDIA RTX A4000 |
Line 99: Line 106:
 <code> <code>
 man sbatch man sbatch
 +</code>
 +
 +=== Rudolf's template ===
 +
 +The main point is for log files to have the job name and job id in them automatically.
 +
 +<code>
 +#SBATCH -J RuRjob
 +#SBATCH -o %x.%j.out
 +#SBATCH -e %x.%j.err
 +#SBATCH -p gpu-troja
 +#SBATCH --gres=gpu:1
 +#SBATCH --mem=16G
 +#SBATCH --constraint="gpuram16G|gpuram24G"
 +
 +# Print each command to STDERR before executing (expanded), prefixed by "+ "
 +set -o xtrace
 </code> </code>
  
Line 220: Line 244:
  
 <code>man srun</code> <code>man srun</code>
 +
 +==== Basic commands on cluster machines ====
 +
 +  lspci
 +    # is any such hardware there?
 +  nvidia-smi
 +    # more details, incl. running processes on the GPU
 +    # nvidia-* are typically located in /usr/bin
 +  watch nvidia-smi
 +    # For monitoring GPU activity in a separate terminal (thanks to Jindrich Libovicky for this!)
 +    # You can also use nvidia-smi -l TIME
 +  nvcc --version
 +    # this should tell CUDA version
 +    # nvcc is typically installed in /usr/local/cuda/bin/
 +  theano-test
 +    # dela to vubec neco uzitecneho? :-)
 +    # theano-* are typically located in /usr/local/bin/
 +  /usr/local/cuda/samples/1_Utilities/deviceQuery/deviceQuery
 +    # shows CUDA capability etc.
 +  ssh dll1; ~popel/bin/gpu_allocations
 +    # who occupies which card on a given machine
 +    
 +
  
 ===== See also ===== ===== See also =====

[ Back to the navigation ] [ Back to the content ]