Differences

This shows you the differences between two versions of the page.

--- gpu [2017/07/19 16:18]
kocmanek [Performance tests]
+++ gpu [2017/10/11 13:07]
bojar people should not set CUDA_VISIBLE_DEVICES
@@ Line 4: / Line 4: @@
 ===== Servers with GPU units =====
+GPU cluster ''gpu.q'' at Malá Strana:
-| machine                    | GPU; [[https://en.wikipedia.org/wiki/CUDA#Supported_GPUs|Capability]] [cc]  | cores | GPU RAM | Comment |
+| machine                    | GPU type | [[https://en.wikipedia.org/wiki/CUDA#GPUs_supported|cc]] | GPUs | GPU RAM | Comment |
-| titan                      | GeForce GTX 1080 Ti; cc6.1 | 1  | 11 GB           |  |
+| iridium                    | Quadro K2000        |  cc3.0 |   1|   2 GB |  |
-| titan-gpu                  | GeForce GTX Titan Z; cc3.5 | 2  | 6 GB each core  |  |
+| titan-gpu                  | GeForce GTX Titan Z |  cc3.5 |   2|   6 GB |  |
-| twister1; twister2; kronos | Tesla K40c; cc3.5          | 1  | 12 GB           |  |
+| twister1; twister2; kronos | Tesla K40c          |  cc3.5 |   1|  12 GB |  |
-| iridium                    | Quadro K2000; cc3.0        | 1  | 2 GB            |  |
+| dll1; dll2                 | GeForce GTX 1080    |  cc6.1 |   8|   8 GB |  |
-| victoria; arc              | GeForce GT 630; cc3.0      | 1  | 2 GB            | desktop machine |
+| titan                      | GeForce GTX 1080    |  cc6.1 |   1|   8 GB |  |
-| athena                     | GeForce GTX 1080; cc6.1    | 1  | 8 GB            | Tom's desktop machine |
+| dll3; dll4; dll5           | GeForce GTX 1080 Ti |  cc6.1 |  10|  11 GB | dll3 has only 9 GPUs since 2017/07 |
-| dll1; dll2                 | GeForce GTX 1080; cc6.1    | 8  | 8 GB each core  |  |
+| dll6                       | GeForce GTX 1080 Ti |  cc6.1 |   3|  11 GB |  |
-| dll3; dll4; dll5           | GeForce GTX 1080 Ti; cc6.1 | 10 | 11 GB each core |  |
-not used at the moment: GeForce GTX 570 (from twister2)
+Desktop machines:
+| machine                    | GPU type | [[https://en.wikipedia.org/wiki/CUDA#GPUs_supported|cc]] | GPUs | GPU RAM | Comment |
+| victoria; arc              | GeForce GT 630   | cc3.0 |  1 |  2 GB | desktop machine |
+| athena                     | GeForce GTX 1080 | cc6.1 |  1 |  8 GB | Tom's desktop machine |
+Not used at the moment: GeForce GTX 570 (from twister2)
 All machines have CUDA8.0 and should support both Theano and TensorFlow.
-Summary of future plans:
+=== Disk space ===
-  * Current Troja servers won't get any GPUs (the only option would be [[http://www.czc.cz/hp-quadro-k1200-4gb/171662/produkt?ppcbee-adtext-variant=Produkt%3B+kategorie+%2B+cena%3B+Pobo%C4%8Dky&gclid=CKbKkbrWrswCFQUq0wodHDELCw|Quadro K1200 4GB]], horribly cost-inefficient)
+All the GPU machines are at Malá Strana (not at Troja), so you should not use ''/lnet/tspec/work/'', but you should use:
-  * The old Quadro K2000 we have is a much more low end piece, so we can't test is in Troja.
+- ''/lnet/spec/work/'' (alias ''/net/work/'') - Lustre disk space at Malá Strana
-  * There is MetaCentrum which also has GPUs, so testing can be done there.
+- ''/net/cluster/TMP'' - NFS hard disk for temporary files, so slower than Lustre for most tasks
-  * It is impossible (wasteful in terms of space and forbidden by a dean regulation) to put non-rack machines to our servers rooms. So we won't be buying GeForce GTX 1080 (~20000CZK, out of stock now), for a non-rack machine since we most likely don't have any available.
+- ''/net/cluster/SSD'' - also NFS, but faster then TMP because of SSD
-  * Yes, there are grant applications under review which include rack machines with GPUs, e.g. 5x2 or something like that; more will be known in 2017.
+- ''/COMP.TMP'' - local (for each machine) space for temporary files (use it instead of ''/tmp''; over-filling ''/COMP.TMP'' should not halt the system).
 === Individual acquisitions: NVIDIA Academic Hardware Grants ==
@@ Line 43: / Line 47: @@
 In this section will be explained how to use cluster properly.
+==== Set-up CUDA and CUDNN ====
+You can add following command into your ~/.bashrc
+  CUDNN_version=6.0
+  CUDA_version=8.0
+  CUDA_DIR_OPT=/opt/cuda-$CUDA_version
+  if [ -d "$CUDA_DIR_OPT" ] ; then
+    CUDA_DIR=$CUDA_DIR_OPT
+    export CUDA_HOME=$CUDA_DIR
+    export THEANO_FLAGS="cuda.root=$CUDA_HOME,device=gpu,floatX=float32"
+    export PATH=$PATH:$CUDA_DIR/bin
+    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_DIR/cudnn/$CUDNN_version/lib64:$CUDA_DIR/lib64
+    export CPATH=$CUDA_DIR/cudnn/$CUDNN_version/include:$CPATH
+  fi
 ==== TensorFlow Environment ====
@@ Line 95: / Line 116: @@
   /usr/local/cuda/samples/1_Utilities/deviceQuery/deviceQuery
     # shows CUDA capability etc.
+  ssh dll1; ~popel/bin/gpu_allocations
+    # who occupies which card on a given machine
 === Select GPU device ===
-Use variable CUDA_VISIBLE_DEVICES to constrain tensorflow to compute only on the selected one. For the use of first GPU use (GPU queue do this for you):
+The variable CUDA_VISIBLE_DEVICES constrains tensorflow and other toolkits to compute only on the selected GPUs. **Do not set this variable yourself** (unless debugging SGE), it is set for you automatically by SGE if you ask for some GPUs (see above).
-  export CUDA_VISIBLE_DEVICES=0
 To list available devices, use:

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences