Differences

This shows you the differences between two versions of the page.

--- gpu [2017/07/28 22:31]
popel nicer table, delete outdated future plans, note on disk space
+++ gpu [2017/10/12 13:42]
ufal [How to use cluster]
@@ Line 11: / Line 11: @@
 | twister1; twister2; kronos | Tesla K40c          |  cc3.5 |   1|  12 GB |  |
 | dll1; dll2                 | GeForce GTX 1080    |  cc6.1 |   8|   8 GB |  |
-| titan                      | GeForce GTX 1080 Ti |  cc6.1 |   1|  11 GB |  |
+| titan                      | GeForce GTX 1080    |  cc6.1 |   1|   8 GB |  |
 | dll3; dll4; dll5           | GeForce GTX 1080 Ti |  cc6.1 |  10|  11 GB | dll3 has only 9 GPUs since 2017/07 |
+| dll6                       | GeForce GTX 1080 Ti |  cc6.1 |   3|  11 GB |  |
 Desktop machines:
@@ Line 27: / Line 28: @@
 - ''/net/cluster/TMP'' - NFS hard disk for temporary files, so slower than Lustre for most tasks
 - ''/net/cluster/SSD'' - also NFS, but faster then TMP because of SSD
+- ''/COMP.TMP'' - local (for each machine) space for temporary files (use it instead of ''/tmp''; over-filling ''/COMP.TMP'' should not halt the system).
 === Individual acquisitions: NVIDIA Academic Hardware Grants ==
@@ Line 45: / Line 47: @@
 In this section will be explained how to use cluster properly.
+==== Set-up CUDA and CUDNN ====
+You can add following command into your ~/.bashrc
+  CUDNN_version=6.0
+  CUDA_version=8.0
+  CUDA_DIR_OPT=/opt/cuda-$CUDA_version
+  if [ -d "$CUDA_DIR_OPT" ] ; then
+    CUDA_DIR=$CUDA_DIR_OPT
+    export CUDA_HOME=$CUDA_DIR
+    export THEANO_FLAGS="cuda.root=$CUDA_HOME,device=gpu,floatX=float32"
+    export PATH=$PATH:$CUDA_DIR/bin
+    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_DIR/cudnn/$CUDNN_version/lib64:$CUDA_DIR/lib64
+    export CPATH=$CUDA_DIR/cudnn/$CUDNN_version/include:$CPATH
+  fi
 ==== TensorFlow Environment ====
@@ Line 62: / Line 81: @@
 This environment have TensorFlow 1.0 and all necessary requirements for NeuralMonkey.
+==== Pytorch Environment ====
+If you want to use pytorch, there is a ready-made environment in
+  /home/hajicj/anaconda3/envs/pytorch/bin
+It does rely on the CUDA and CuDNN setup above.
 ==== Using cluster ====
@@ Line 97: / Line 124: @@
   /usr/local/cuda/samples/1_Utilities/deviceQuery/deviceQuery
     # shows CUDA capability etc.
+  ssh dll1; ~popel/bin/gpu_allocations
+    # who occupies which card on a given machine
 === Select GPU device ===
-Use variable CUDA_VISIBLE_DEVICES to constrain tensorflow to compute only on the selected one. For the use of first GPU use (GPU queue do this for you):
+The variable CUDA_VISIBLE_DEVICES constrains tensorflow and other toolkits to compute only on the selected GPUs. **Do not set this variable yourself** (unless debugging SGE), it is set for you automatically by SGE if you ask for some GPUs (see above).
-  export CUDA_VISIBLE_DEVICES=0
 To list available devices, use:

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences