[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
grid [2018/06/26 16:21]
vodrazka [MS = Malá Strana (cpu-ms.q)]
grid [2018/11/14 19:08] (current)
popel [Advanced usage] removed duplication
Line 10: Line 10:
 Some machines are at Malá Strana (ground floor, new server room built from Lindat budget), some are at Troja (5 km north-east). Some machines are at Malá Strana (ground floor, new server room built from Lindat budget), some are at Troja (5 km north-east).
 If you need to quickly distinguish which machine is located where, you can use your knowledge of [[https://​en.wikipedia.org/​wiki/​Trojan_War|Trojan war]]-related heroes, ''​qhost -q'',​ or the tables below. ​ If you need to quickly distinguish which machine is located where, you can use your knowledge of [[https://​en.wikipedia.org/​wiki/​Trojan_War|Trojan war]]-related heroes, ''​qhost -q'',​ or the tables below. ​
 +
 +====== AVX instructions ======
  
 ==== Troja (cpu-troja.q) ==== ==== Troja (cpu-troja.q) ====
 ^ Name                ^ CPU type                                  ^ GHz ^cores ^RAM(GB)^ note ^ ^ Name                ^ CPU type                                  ^ GHz ^cores ^RAM(GB)^ note ^
-| achilles[1-8] ​      | Intel(R) Xeon(R) CPU E5-2630 v3           | 2.4 |   31 |  123  |      +| achilles[1-8] ​      | Intel(R) Xeon(R) CPU E5-2630 v3           | 2.4 |   31 |  123  |  AVX enabled ​   ​
-| hector[1-8] ​        | Intel(R) Xeon(R) CPU E5-2630 v3           | 2.4 |   31 |  123  |      +| hector[1-8] ​        | Intel(R) Xeon(R) CPU E5-2630 v3           | 2.4 |   31 |  123  |  AVX enabled ​   ​
-| helena[1-8] ​        | Intel(R) Xeon(R) CPU E5-2630 v3           | 2.4 |   31 |  123  |      +| helena[1-8] ​        | Intel(R) Xeon(R) CPU E5-2630 v3           | 2.4 |   31 |  123  |  AVX enabled ​   ​
-| paris[1-8] ​         | Intel(R) Xeon(R) CPU E5-2630 v3           | 2.4 |   31 |  123  |      |+| paris[1-8] ​         | Intel(R) Xeon(R) CPU E5-2630 v3           | 2.4 |   31 |  123  |  AVX enabled ​   ​|
  
 ==== MS = Malá Strana (cpu-ms.q) ==== ==== MS = Malá Strana (cpu-ms.q) ====
Line 23: Line 25:
 | andromeda[1-13] ​    | AMD Opteron ​                    | 2.8 |    7 |   ​30 ​ |      | | andromeda[1-13] ​    | AMD Opteron ​                    | 2.8 |    7 |   ​30 ​ |      |
 | lucifer[1-10] ​      | Intel(R) Xeon(R) CPU E5620      | 2.4 |   15 |  122  |      | | lucifer[1-10] ​      | Intel(R) Xeon(R) CPU E5620      | 2.4 |   15 |  122  |      |
-| hydra[1-4] ​         | AMD Opteron SSE4 AVX            | 2.6 |   15 |  122  |      +| hydra[1-4] ​         | AMD Opteron SSE4 AVX            | 2.6 |   15 |  122  |   AVX enabled ​  
-| orion[1-8] ​         | Intel(R) Xeon(R) CPU E5-2630 v4 | 2.2 |   39 |  122  |      |+| orion[1-8] ​         | Intel(R) Xeon(R) CPU E5-2630 v4 | 2.2 |   39 |  122  |   AVX enabled ​  |
 | cosmos ​             | Intel Xeon                      | 2.9 |    7 |  249  |      | | cosmos ​             | Intel Xeon                      | 2.9 |    7 |  249  |      |
-| belzebub ​           | Intel Xeon SSE4 AVX             | 2.9 |   31 |  249  |      |+| belzebub ​           | Intel Xeon SSE4 AVX             | 2.9 |   31 |  249  |   AVX enabled ​  |
 | iridium ​            | Intel Xeon SSE4                 | 1.9 |   15 |  501  |      | | iridium ​            | Intel Xeon SSE4                 | 1.9 |   15 |  501  |      |
  
Line 80: Line 82:
  
   export LC_ALL=en_US.UTF-8   export LC_ALL=en_US.UTF-8
 +  ​
 +If you are curious about purpose of .bashrc and .bash_profile and you need to know when they should be used you may read [[https://​stackoverflow.com/​a/​415444|this]].
   ​   ​
 ===== Basic usage ===== ===== Basic usage =====
Line 91: Line 95:
   # prepare a shell script describing your task   # prepare a shell script describing your task
 qsub -cwd -j y script.sh Hello World qsub -cwd -j y script.sh Hello World
-  # This submits your job to the default queue, which is currently ''​cpu-ms.q''​.+  # This submits your job to the default queue, which is currently ''​cpu-*.q''​.
   # Usually, there is a free slot, so the job will be scheduled within few seconds.   # Usually, there is a free slot, so the job will be scheduled within few seconds.
   # We have used two handy qsub parameters:   # We have used two handy qsub parameters:
Line 172: Line 176:
 See ''​man complex''​ (run it on lrc or sol machines) for a list of possible resources you may require (in addition to ''​mem_free''​ etc. discussed above). See ''​man complex''​ (run it on lrc or sol machines) for a list of possible resources you may require (in addition to ''​mem_free''​ etc. discussed above).
  
-''​qsub **-p** -99''​ +''​qsub **-p** -200''​ 
-Define a priority of your job as a number between -1024 and 0. Only SGE admins may use a number higher than 0. In January 2018, we changed the default to -100 (it used to be 0). SGE uses the priority to decide when to start which pending job in the queue (it computes a real number called ''​prior'',​ which is reported in ''​qstat'',​ which grows as the job is waiting in the queue). Note that once a job is started, you cannot "​unschedule"​ it, so from that moment on, it is irrelevant what was its priority. You can ask for a higher priority (-99...0) if your job is urgent and/or will finish soon and you want to skip your colleagues'​ jobs in the queue. You should ask for lower priority (-1024..-101) if you submit many jobs at once or if the jobs are not urgent.+Define a priority of your job as a number between -1024 and 0. Only SGE admins may use a number higher than 0. In January 2018, we changed the default to -100 (it used to be 0). Please, do not use priority between -99 and 0 for jobs taking longer than a few hours, unless it is absolutely necessary for a deadline. In that case, please notify other GPU users. You should ask for lower priority (-1024..-101) if you submit many jobs at once or if the jobs are not urgent. SGE uses the priority to decide when to start which pending job in the queue (it computes a real number called ''​prior'',​ which is reported in ''​qstat'',​ which grows as the job is waiting in the queue). Note that once a job is started, you cannot "​unschedule"​ it, so from that moment on, it is irrelevant what was its priority.
  
 ''​qsub **-o** LOG.stdout **-e** LOG.stderr''​ ''​qsub **-o** LOG.stdout **-e** LOG.stderr''​
Line 288: Line 292:
 === Ssh to random sol === === Ssh to random sol ===
 Ondřej Bojar suggests to add the following alias to your .bashrc (cf. [[#​sshcwd]]):​ Ondřej Bojar suggests to add the following alias to your .bashrc (cf. [[#​sshcwd]]):​
-<​code>​alias cluster='​comp=$(($RANDOM ​/4095 +1)); ssh -o "​StrictHostKeyChecking no" sol$comp'</​code>​+<​code>​alias cluster='​comp=$(( ​(RANDOM ​% 10) +1)); ssh -o "​StrictHostKeyChecking no" sol$comp'</​code>​
  
 ===== Job monitoring ===== ===== Job monitoring =====
Line 294: Line 298:
   * ''​qstat [-u user]''​ -- print a list of running/​waiting jobs of a given user   * ''​qstat [-u user]''​ -- print a list of running/​waiting jobs of a given user
   * ''​qhost''​ -- print available/​total resources   * ''​qhost''​ -- print available/​total resources
-  * ''/​SGE/​REPORTER/​LRC-UFAL/​bin/​lrc_users_real_mem_usage -u user -w''​ -- current memory usage of a given user +  ​* ''​qacct -j job_id''​ -- print info even for ended job (for which ''​qstat -j job_id''​ does not work). See ''​man qacct''​ for more. 
-  * ''/​SGE/​REPORTER/​LRC-UFAL/​bin/​lrc_users_limits_requested -w''​ -- required resources of all users + 
-  * ''/​SGE/​REPORTER/​LRC-UFAL/​bin/​lrc_nodes_meminfo''​ -- memory usage of all nodes+  ​* ''/​opt/LRC/​REPORTER/​LRC-UFAL/​bin/​lrc_users_real_mem_usage -u user -w''​ -- current memory usage of a given user 
 +  * ''/​opt/LRC/​REPORTER/​LRC-UFAL/​bin/​lrc_users_limits_requested -w''​ -- required resources of all users 
 +  * ''/​opt/LRC/​REPORTER/​LRC-UFAL/​bin/​lrc_nodes_meminfo''​ -- memory usage of all nodes
     * mem_total:     * mem_total:
     * mem_free: total memory minus reserved memory (using ''​qsub -l mem_free''​) for each node     * mem_free: total memory minus reserved memory (using ''​qsub -l mem_free''​) for each node
     * act_mem_free:​ really free memory     * act_mem_free:​ really free memory
     * mem_used: really used memory     * mem_used: really used memory
-  * ''/​SGE/​REPORTER/​LRC-UFAL/​bin/​lrc_state_overview''​ -- overall summary (with per-user stats for users with running jobs) +  * ''/​opt/LRC/​REPORTER/​LRC-UFAL/​bin/​lrc_state_overview''​ -- overall summary (with per-user stats for users with running jobs) 
-  * ''​cat /SGE/​REPORTER/​LRC-UFAL/​stats/​userlist.weight''​ -- all users sorted according to their activity (number of submitted jobs × their average duration), updated each night+  * ''​cat /opt/LRC/​REPORTER/​LRC-UFAL/​stats/​userlist.weight''​ -- all users sorted according to their activity (number of submitted jobs × their average duration), updated each night 
   * [[http://​ufaladm2/​munin/​ufal.hide.ms.mff.cuni.cz/​lrc-headnode.ufal.hide.ms.mff.cuni.cz/​index.html|Munin:​ graph of cluster usage by day and user]] and  [[http://​ufaladm2/​munin/​ufal.hide.ms.mff.cuni.cz/​apophis.ufal.hide.ms.mff.cuni.cz/​index.html|Munin monitoring of Apophis disk server]] (both accessible only from ÚFAL network)   * [[http://​ufaladm2/​munin/​ufal.hide.ms.mff.cuni.cz/​lrc-headnode.ufal.hide.ms.mff.cuni.cz/​index.html|Munin:​ graph of cluster usage by day and user]] and  [[http://​ufaladm2/​munin/​ufal.hide.ms.mff.cuni.cz/​apophis.ufal.hide.ms.mff.cuni.cz/​index.html|Munin monitoring of Apophis disk server]] (both accessible only from ÚFAL network)
  

[ Back to the navigation ] [ Back to the content ]