Differences

This shows you the differences between two versions of the page.

--- gpu [2017/10/17 16:38]
popel [Rules]
+++ gpu [2017/11/13 09:43]
bojar [Rules]
@@ Line 27: / Line 27: @@
   * All the rules from [[:Grid]] apply, even more strictly than for CPU because there are too many GPU users and not as many GPUs available. So as a reminder: always use GPUs via ''qsub'' (or ''qrsh''), never via ''ssh''. You can ssh to any machine e.g. to run ''nvidia-smi'' or ''htop'', but not to start computing on GPU. Don't forget to specify you RAM requirements with e.g. ''-l mem_free=8G,act_mem_free=8G,h_vmem=12G''.
   * Always specify the number of GPU cards (e.g. ''gpu=1''), the minimal Cuda capability you need (e.g. ''gpu_cc_min3.5=1'') and you GPU memory requirements (e.g. ''gpu_ram=2G''). Thus e.g. <code>qsub -q gpu.q -l gpu=1,gpu_cc_min3.5=1,gpu_ram=2G</code>
-  * If you need more than one GPU card, always require as many CPU cores as many GPU cards you need. E.g. <code>qsub -q gpu.q -l gpu=4,gpu_cc_min3.5=1,gpu_ram=7G -pe smp 4</code>
+  * If you need more than one GPU card (on a single machine), always require as many CPU cores (''-pe cmp X'') as many GPU cards you need. E.g. <code>qsub -q gpu.q -l gpu=4,gpu_cc_min3.5=1,gpu_ram=7G -pe smp 4</code>
-  * For interactive jobs, you can use ''qrsh'', but make sure to end your job as soon as you don't need the GPU (so don't use qrsh for long training). E.g. <code>qrsh -q gpu.q -l gpu=1,gpu_ram=2G -pty yes bash</code>
+  * For interactive jobs, you can use ''qrsh'', but make sure to end your job as soon as you don't need the GPU (so don't use qrsh for long training). **Warning: ''-pty yes bash'' is necessary**, otherwise the variable ''$CUDA_VISIBLE_DEVICES'' will not be set correctly. E.g. <code>qrsh -q gpu.q -l gpu=1,gpu_ram=2G -pty yes bash</code>In general: don't reserve a GPU (as described above) without actually using it for longer time. Ondřej Bojar has a script /home/bojar/tools/servers/watch_gpus for watching reserved but unused GPU on most machines which will e-mail you, but don't rely on in only.
 ===== How to use cluster =====
@@ Line 76: / Line 76: @@
 ==== Using cluster ====
-Rule number one, always use the GPU queue (never log in machine by ssh). Always use qsub or qsubmit with proper arguments.
+As an alternative to ''qsub'', you can use /home/bojar/tools/shell/qsubmit
-For testing and using the cluster interactively you can use qrsh (this should not be used for long running experiments since the console is not closed on the end of the experiment). Following command will assign you a GPU and creates interactive console.
-  qrsh -q gpu.q -l gpu=1,gpu_ram=2G -pty yes bash
-For running experiments you must use qsub command:
-  qsub -q gpu.q -l gpu=1,gpu_cc_min3.5=1,gpu_ram=2G WHAT_SHOULD_BE_RUN
-Cleaner way to use cluster is with /home/bojar/tools/shell/qsubmit
   qsubmit --gpumem=2G --queue="gpu.q" WHAT_SHOULD_BE_RUN

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences