Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
gpu [2017/10/17 16:39] popel [Using cluster] |
gpu [2017/11/23 14:15] bojar link to munin graphs |
Not used at the moment: GeForce GTX 570 (from twister2) | Not used at the moment: GeForce GTX 570 (from twister2) |
All machines have CUDA8.0 and should support both Theano and TensorFlow. | All machines have CUDA8.0 and should support both Theano and TensorFlow. |
| |
| [[https://ufaladm2.ufal.hide.ms.mff.cuni.cz/munin/ufal.hide.ms.mff.cuni.cz/lrc-headnode.ufal.hide.ms.mff.cuni.cz/index.html#dll|GPU usage rolling graphs]] |
| |
| |
===== Rules ===== | ===== Rules ===== |
* All the rules from [[:Grid]] apply, even more strictly than for CPU because there are too many GPU users and not as many GPUs available. So as a reminder: always use GPUs via ''qsub'' (or ''qrsh''), never via ''ssh''. You can ssh to any machine e.g. to run ''nvidia-smi'' or ''htop'', but not to start computing on GPU. Don't forget to specify you RAM requirements with e.g. ''-l mem_free=8G,act_mem_free=8G,h_vmem=12G''. | * All the rules from [[:Grid]] apply, even more strictly than for CPU because there are too many GPU users and not as many GPUs available. So as a reminder: always use GPUs via ''qsub'' (or ''qrsh''), never via ''ssh''. You can ssh to any machine e.g. to run ''nvidia-smi'' or ''htop'', but not to start computing on GPU. Don't forget to specify you RAM requirements with e.g. ''-l mem_free=8G,act_mem_free=8G,h_vmem=12G''. |
* Always specify the number of GPU cards (e.g. ''gpu=1''), the minimal Cuda capability you need (e.g. ''gpu_cc_min3.5=1'') and you GPU memory requirements (e.g. ''gpu_ram=2G''). Thus e.g. <code>qsub -q gpu.q -l gpu=1,gpu_cc_min3.5=1,gpu_ram=2G</code> | * Always specify the number of GPU cards (e.g. ''gpu=1''), the minimal Cuda capability you need (e.g. ''gpu_cc_min3.5=1'') and you GPU memory requirements (e.g. ''gpu_ram=2G''). Thus e.g. <code>qsub -q gpu.q -l gpu=1,gpu_cc_min3.5=1,gpu_ram=2G</code> |
* If you need more than one GPU card, always require as many CPU cores as many GPU cards you need. E.g. <code>qsub -q gpu.q -l gpu=4,gpu_cc_min3.5=1,gpu_ram=7G -pe smp 4</code> | * If you need more than one GPU card (on a single machine), always require as many CPU cores (''-pe cmp X'') as many GPU cards you need. E.g. <code>qsub -q gpu.q -l gpu=4,gpu_cc_min3.5=1,gpu_ram=7G -pe smp 4</code> |
* For interactive jobs, you can use ''qrsh'', but make sure to end your job as soon as you don't need the GPU (so don't use qrsh for long training). E.g. <code>qrsh -q gpu.q -l gpu=1,gpu_ram=2G -pty yes bash</code> | * For interactive jobs, you can use ''qrsh'', but make sure to end your job as soon as you don't need the GPU (so don't use qrsh for long training). **Warning: ''-pty yes bash'' is necessary**, otherwise the variable ''$CUDA_VISIBLE_DEVICES'' will not be set correctly. E.g. <code>qrsh -q gpu.q -l gpu=1,gpu_ram=2G -pty yes bash</code>In general: don't reserve a GPU (as described above) without actually using it for longer time. Ondřej Bojar has a script /home/bojar/tools/servers/watch_gpus for watching reserved but unused GPU on most machines which will e-mail you, but don't rely on in only. |
| |
===== How to use cluster ===== | ===== How to use cluster ===== |