Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
gpu [2017/10/17 16:39] popel [Using cluster] |
gpu [2017/11/13 09:43] bojar [Rules] |
* All the rules from [[:Grid]] apply, even more strictly than for CPU because there are too many GPU users and not as many GPUs available. So as a reminder: always use GPUs via ''qsub'' (or ''qrsh''), never via ''ssh''. You can ssh to any machine e.g. to run ''nvidia-smi'' or ''htop'', but not to start computing on GPU. Don't forget to specify you RAM requirements with e.g. ''-l mem_free=8G,act_mem_free=8G,h_vmem=12G''. | * All the rules from [[:Grid]] apply, even more strictly than for CPU because there are too many GPU users and not as many GPUs available. So as a reminder: always use GPUs via ''qsub'' (or ''qrsh''), never via ''ssh''. You can ssh to any machine e.g. to run ''nvidia-smi'' or ''htop'', but not to start computing on GPU. Don't forget to specify you RAM requirements with e.g. ''-l mem_free=8G,act_mem_free=8G,h_vmem=12G''. |
* Always specify the number of GPU cards (e.g. ''gpu=1''), the minimal Cuda capability you need (e.g. ''gpu_cc_min3.5=1'') and you GPU memory requirements (e.g. ''gpu_ram=2G''). Thus e.g. <code>qsub -q gpu.q -l gpu=1,gpu_cc_min3.5=1,gpu_ram=2G</code> | * Always specify the number of GPU cards (e.g. ''gpu=1''), the minimal Cuda capability you need (e.g. ''gpu_cc_min3.5=1'') and you GPU memory requirements (e.g. ''gpu_ram=2G''). Thus e.g. <code>qsub -q gpu.q -l gpu=1,gpu_cc_min3.5=1,gpu_ram=2G</code> |
* If you need more than one GPU card, always require as many CPU cores as many GPU cards you need. E.g. <code>qsub -q gpu.q -l gpu=4,gpu_cc_min3.5=1,gpu_ram=7G -pe smp 4</code> | * If you need more than one GPU card (on a single machine), always require as many CPU cores (''-pe cmp X'') as many GPU cards you need. E.g. <code>qsub -q gpu.q -l gpu=4,gpu_cc_min3.5=1,gpu_ram=7G -pe smp 4</code> |
* For interactive jobs, you can use ''qrsh'', but make sure to end your job as soon as you don't need the GPU (so don't use qrsh for long training). E.g. <code>qrsh -q gpu.q -l gpu=1,gpu_ram=2G -pty yes bash</code> | * For interactive jobs, you can use ''qrsh'', but make sure to end your job as soon as you don't need the GPU (so don't use qrsh for long training). **Warning: ''-pty yes bash'' is necessary**, otherwise the variable ''$CUDA_VISIBLE_DEVICES'' will not be set correctly. E.g. <code>qrsh -q gpu.q -l gpu=1,gpu_ram=2G -pty yes bash</code>In general: don't reserve a GPU (as described above) without actually using it for longer time. Ondřej Bojar has a script /home/bojar/tools/servers/watch_gpus for watching reserved but unused GPU on most machines which will e-mail you, but don't rely on in only. |
| |
===== How to use cluster ===== | ===== How to use cluster ===== |