[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
gpu [2017/07/28 22:31]
popel nicer table, delete outdated future plans, note on disk space
gpu [2017/10/12 13:42]
ufal [How to use cluster]
Line 11: Line 11:
 | twister1; twister2; kronos | Tesla K40c          |  cc3.5 |   1|  12 GB |  | | twister1; twister2; kronos | Tesla K40c          |  cc3.5 |   1|  12 GB |  |
 | dll1; dll2                 | GeForce GTX 1080    |  cc6.1 |   8|   8 GB |  | | dll1; dll2                 | GeForce GTX 1080    |  cc6.1 |   8|   8 GB |  |
-| titan                      | GeForce GTX 1080 Ti |  cc6.1 |   1|  11 GB |  |+| titan                      | GeForce GTX 1080    |  cc6.1 |   1|   8 GB |  |
 | dll3; dll4; dll5           | GeForce GTX 1080 Ti |  cc6.1 |  10|  11 GB | dll3 has only 9 GPUs since 2017/07 | | dll3; dll4; dll5           | GeForce GTX 1080 Ti |  cc6.1 |  10|  11 GB | dll3 has only 9 GPUs since 2017/07 |
 +| dll6                       | GeForce GTX 1080 Ti |  cc6.1 |   3|  11 GB |  |
  
 Desktop machines: Desktop machines:
Line 27: Line 28:
 - ''/net/cluster/TMP'' - NFS hard disk for temporary files, so slower than Lustre for most tasks - ''/net/cluster/TMP'' - NFS hard disk for temporary files, so slower than Lustre for most tasks
 - ''/net/cluster/SSD'' - also NFS, but faster then TMP because of SSD - ''/net/cluster/SSD'' - also NFS, but faster then TMP because of SSD
 +- ''/COMP.TMP'' - local (for each machine) space for temporary files (use it instead of ''/tmp''; over-filling ''/COMP.TMP'' should not halt the system).
  
 === Individual acquisitions: NVIDIA Academic Hardware Grants == === Individual acquisitions: NVIDIA Academic Hardware Grants ==
Line 45: Line 47:
  
 In this section will be explained how to use cluster properly.  In this section will be explained how to use cluster properly. 
 +
 +==== Set-up CUDA and CUDNN ====
 +
 +You can add following command into your ~/.bashrc
 +
 +  CUDNN_version=6.0
 +  CUDA_version=8.0
 +  CUDA_DIR_OPT=/opt/cuda-$CUDA_version
 +  if [ -d "$CUDA_DIR_OPT" ] ; then
 +    CUDA_DIR=$CUDA_DIR_OPT
 +    export CUDA_HOME=$CUDA_DIR
 +    export THEANO_FLAGS="cuda.root=$CUDA_HOME,device=gpu,floatX=float32"
 +    export PATH=$PATH:$CUDA_DIR/bin
 +    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_DIR/cudnn/$CUDNN_version/lib64:$CUDA_DIR/lib64
 +    export CPATH=$CUDA_DIR/cudnn/$CUDNN_version/include:$CPATH
 +  fi
 +
 ==== TensorFlow Environment ==== ==== TensorFlow Environment ====
  
Line 62: Line 81:
  
 This environment have TensorFlow 1.0 and all necessary requirements for NeuralMonkey. This environment have TensorFlow 1.0 and all necessary requirements for NeuralMonkey.
 +
 +==== Pytorch Environment ====
 +
 +If you want to use pytorch, there is a ready-made environment in
 +
 +  /home/hajicj/anaconda3/envs/pytorch/bin
 +  
 +It does rely on the CUDA and CuDNN setup above.
  
 ==== Using cluster ==== ==== Using cluster ====
Line 97: Line 124:
   /usr/local/cuda/samples/1_Utilities/deviceQuery/deviceQuery   /usr/local/cuda/samples/1_Utilities/deviceQuery/deviceQuery
     # shows CUDA capability etc.     # shows CUDA capability etc.
 +  ssh dll1; ~popel/bin/gpu_allocations
 +    # who occupies which card on a given machine
          
 === Select GPU device === === Select GPU device ===
  
-Use variable CUDA_VISIBLE_DEVICES to constrain tensorflow to compute only on the selected oneFor the use of first GPU use (GPU queue do this for you)+The variable CUDA_VISIBLE_DEVICES constrains tensorflow and other toolkits to compute only on the selected GPUs**Do not set this variable yourself** (unless debugging SGE), it is set for you automatically by SGE if you ask for some GPUs (see above).
-  export CUDA_VISIBLE_DEVICES=0+
  
 To list available devices, use: To list available devices, use:

[ Back to the navigation ] [ Back to the content ]