Differences

This shows you the differences between two versions of the page.

--- gpu [2018/12/01 21:09]
popel [Rules] -pe smp now works with gpu
+++ gpu [2019/02/11 10:50]
naplava [Servers with GPU units]
@@ Line 5: / Line 5: @@
 ===== Servers with GPU units =====
 GPU cluster ''gpu-ms.q'' at Malá Strana:
-| machine | GPU type | GPU driver version | [[https://en.wikipedia.org/wiki/CUDA#GPUs_supported|cc]] | GPU cnt | GPU RAM (GB) | machine RAM (GB)| AVX |
-| dll1 |  GeForce GTX 1080 |  396.24 |  6.1 |  8 |  8 |  249 | yes |
+| machine | GPU type | GPU driver version | [[https://en.wikipedia.org/wiki/CUDA#GPUs_supported|cc]] | GPU cnt | GPU RAM (GB) | machine RAM (GB)|
-| dll2 (out of order) |  GeForce GTX 1080 |  396.24 |  6.1 |  8 |  8 |  249 | yes |
+| dll1 |  GeForce GTX 1080 |  396.24 |  6.1 |  8 |  8 |  249 |
-| dll3 |  GeForce GTX 1080 Ti |  396.24 |  6.1 |  10 |  11 |  249 | yes |
+| dll2 |  GeForce GTX 1080 |  396.24 |  6.1 |  8 |  8 |  249 |
-| dll4 |  GeForce GTX 1080 Ti |  396.24 |  6.1 |  10 |  11 |  249 | yes |
+| dll3 |  GeForce GTX 1080 Ti |  396.24 |  6.1 |  10 |  11 |  249 |
-| dll5 |  GeForce GTX 1080 Ti |  396.24 |  6.1 |  10 |  11 |  249 | yes |
+| dll4 |  GeForce GTX 1080 Ti |  396.24 |  6.1 |  10 |  11 |  249 |
-| dll6 |  GeForce GTX 1080 Ti |  396.24 |  6.1 |  10 |  11 |  123 | yes |
+| dll5 |  GeForce GTX 1080 Ti |  396.24 |  6.1 |  10 |  11 |  249 |
-| kronos |  GeForce GTX 1080 Ti |  396.24 |  6.1 |  1 |  11 |  123 | yes |
+| dll6 |  GeForce GTX 1080 Ti |  396.24 |  6.1 |  10 |  11 |  123 |
-| titan1 |  GeForce GTX 1080 |  396.24 |  6.1 |  1 |  8 |  30 | yes |
+| dll7 |  GeForce GTX 1080 Ti |  396.24 |  6.1 |  1 |  11 |  123 |
-| titan2 |  Tesla K40c |  396.24 |  3.5 |  1 |  11 |  30 | yes |
+| kronos |  GeForce GTX 1080 Ti |  396.24 |  6.1 |  1 |  11 |  123 |
-| twister1 |  Tesla K40c |  396.24 |  3.5 |  1 |  11 |  45 | no |
+| titan1 |  GeForce GTX 1080 |  396.24 |  6.1 |  1 |  8 |  30 |
-| twister2 |  Tesla K40c |  396.24 |  3.5 |  1 |  11 |  45 | no |
+| titan2 |  Tesla K40c |  396.24 |  3.5 |  1 |  11 |  30 |
+GPU cluster ''gpu-troja.q'' at Troja:
+| machine | GPU type | GPU driver version | [[https://en.wikipedia.org/wiki/CUDA#GPUs_supported|cc]] | GPU cnt | GPU RAM (GB) | machine RAM (GB)|
+| tdll1 |  Quadro P5000 |  410.48 |  6.1 |  8 |  16 |  245 |
+| tdll2 |  Quadro P5000 |  410.48 |  6.1 |  8 |  16 |  245 |
+| tdll3 |  Quadro P5000 |  410.48 |  6.1 |  8 |  16 |  245 |
+| tdll4 |  Quadro P5000 |  410.48 |  6.1 |  8 |  16 |  245 |
+| tdll5 |  Quadro P5000 |  410.48 |  6.1 |  8 |  16 |  245 |
 Desktop machines:
@@ Line 23: / Line 32: @@
 | athena                     | GeForce GTX 1080 | cc6.1 |  1 |  8 GB | Tom's desktop machine |
-Not used at the moment: GeForce GTX 570 (from twister2)
 Multiple versions of CUDA library are accessible on each machine together with cudnn. Theano and TensorFlow is supported.
@@ Line 31: / Line 39: @@
 ===== Rules =====
   * First, read [[internal:Linux network]] and [[:Grid]].
-  * All the rules from [[:Grid]] apply, even more strictly than for CPU because there are too many GPU users and not as many GPUs available. So as a reminder: always use GPUs via ''qsub'' (or ''qrsh''), never via ''ssh''. You can ssh to any machine e.g. to run ''nvidia-smi'' or ''htop'', but not to start computing on GPU. Don't forget to specify you RAM requirements with e.g. ''-l mem_free=8G,act_mem_free=8G,h_vmem=12G''.
+  * All the rules from [[:Grid]] apply, even more strictly than for CPU because there are too many GPU users and not as many GPUs available. So as a reminder: always use GPUs via ''qsub'' (or ''qrsh''), never via ''ssh''. You can ssh to any machine e.g. to run ''nvidia-smi'' or ''htop'', but not to start computing on GPU. Don't forget to specify you RAM requirements with e.g. ''-l mem_free=8G,act_mem_free=8G,h_data=12G''.
+    * **Note that you need to use ''h_data'' instead of ''h_vmem'' for GPU jobs.** CUDA driver allocates a lot of "unused" virtual memory (tens of GB per card), which is counted in ''h_vmem'', but not in ''h_data''. All usual allocations (''malloc'', ''new'', Python allocations) seem to be included in ''h_data''.
   * Always specify the number of GPU cards (e.g. ''gpu=1''), the minimal Cuda capability you need (e.g. ''gpu_cc_min3.5=1'') and your GPU memory requirements (e.g. ''gpu_ram=2G''). Thus e.g. <code>qsub -q gpu-ms.q -l gpu=1,gpu_cc_min3.5=1,gpu_ram=2G</code>
   * If you need more than one GPU card (on a single machine), always require as many CPU cores (''-pe smp X'') as many GPU cards you need. E.g. <code>qsub -q gpu-ms.q -l gpu=4,gpu_cc_min3.5=1,gpu_ram=7G -pe smp 4</code>

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences