Differences

This shows you the differences between two versions of the page.

--- grid [2018/06/26 16:21]
vodrazka [MS = Malá Strana (cpu-ms.q)]
+++ grid [2018/11/14 19:08]
popel [Advanced usage] removed duplication
@@ Line 10: / Line 10: @@
 Some machines are at Malá Strana (ground floor, new server room built from Lindat budget), some are at Troja (5 km north-east).
 If you need to quickly distinguish which machine is located where, you can use your knowledge of [[https://en.wikipedia.org/wiki/Trojan_War|Trojan war]]-related heroes, ''qhost -q'', or the tables below.
+====== AVX instructions ======
 ==== Troja (cpu-troja.q) ====
 ^ Name                ^ CPU type                                  ^ GHz ^cores ^RAM(GB)^ note ^
-| achilles[1-8]       | Intel(R) Xeon(R) CPU E5-2630 v3           | 2.4 |   31 |  123  |      |
+| achilles[1-8]       | Intel(R) Xeon(R) CPU E5-2630 v3           | 2.4 |   31 |  123  |  AVX enabled    |
-| hector[1-8]         | Intel(R) Xeon(R) CPU E5-2630 v3           | 2.4 |   31 |  123  |      |
+| hector[1-8]         | Intel(R) Xeon(R) CPU E5-2630 v3           | 2.4 |   31 |  123  |  AVX enabled    |
-| helena[1-8]         | Intel(R) Xeon(R) CPU E5-2630 v3           | 2.4 |   31 |  123  |      |
+| helena[1-8]         | Intel(R) Xeon(R) CPU E5-2630 v3           | 2.4 |   31 |  123  |  AVX enabled    |
-| paris[1-8]          | Intel(R) Xeon(R) CPU E5-2630 v3           | 2.4 |   31 |  123  |      |
+| paris[1-8]          | Intel(R) Xeon(R) CPU E5-2630 v3           | 2.4 |   31 |  123  |  AVX enabled    |
 ==== MS = Malá Strana (cpu-ms.q) ====
@@ Line 23: / Line 25: @@
 | andromeda[1-13]     | AMD Opteron                     | 2.8 |    7 |   30  |      |
 | lucifer[1-10]       | Intel(R) Xeon(R) CPU E5620      | 2.4 |   15 |  122  |      |
-| hydra[1-4]          | AMD Opteron SSE4 AVX            | 2.6 |   15 |  122  |      |
+| hydra[1-4]          | AMD Opteron SSE4 AVX            | 2.6 |   15 |  122  |   AVX enabled   |
-| orion[1-8]          | Intel(R) Xeon(R) CPU E5-2630 v4 | 2.2 |   39 |  122  |      |
+| orion[1-8]          | Intel(R) Xeon(R) CPU E5-2630 v4 | 2.2 |   39 |  122  |   AVX enabled   |
 | cosmos              | Intel Xeon                      | 2.9 |    7 |  249  |      |
-| belzebub            | Intel Xeon SSE4 AVX             | 2.9 |   31 |  249  |      |
+| belzebub            | Intel Xeon SSE4 AVX             | 2.9 |   31 |  249  |   AVX enabled   |
 | iridium             | Intel Xeon SSE4                 | 1.9 |   15 |  501  |      |
@@ Line 80: / Line 82: @@
   export LC_ALL=en_US.UTF-8
+If you are curious about purpose of .bashrc and .bash_profile and you need to know when they should be used you may read [[https://stackoverflow.com/a/415444|this]].
 ===== Basic usage =====
@@ Line 91: / Line 95: @@
   # prepare a shell script describing your task
 qsub -cwd -j y script.sh Hello World
-  # This submits your job to the default queue, which is currently ''cpu-ms.q''.
+  # This submits your job to the default queue, which is currently ''cpu-*.q''.
   # Usually, there is a free slot, so the job will be scheduled within few seconds.
   # We have used two handy qsub parameters:
@@ Line 172: / Line 176: @@
 See ''man complex'' (run it on lrc or sol machines) for a list of possible resources you may require (in addition to ''mem_free'' etc. discussed above).
-''qsub **-p** -99''
+''qsub **-p** -200''
-Define a priority of your job as a number between -1024 and 0. Only SGE admins may use a number higher than 0. In January 2018, we changed the default to -100 (it used to be 0). SGE uses the priority to decide when to start which pending job in the queue (it computes a real number called ''prior'', which is reported in ''qstat'', which grows as the job is waiting in the queue). Note that once a job is started, you cannot "unschedule" it, so from that moment on, it is irrelevant what was its priority. You can ask for a higher priority (-99...0) if your job is urgent and/or will finish soon and you want to skip your colleagues' jobs in the queue. You should ask for lower priority (-1024..-101) if you submit many jobs at once or if the jobs are not urgent.
+Define a priority of your job as a number between -1024 and 0. Only SGE admins may use a number higher than 0. In January 2018, we changed the default to -100 (it used to be 0). Please, do not use priority between -99 and 0 for jobs taking longer than a few hours, unless it is absolutely necessary for a deadline. In that case, please notify other GPU users. You should ask for lower priority (-1024..-101) if you submit many jobs at once or if the jobs are not urgent. SGE uses the priority to decide when to start which pending job in the queue (it computes a real number called ''prior'', which is reported in ''qstat'', which grows as the job is waiting in the queue). Note that once a job is started, you cannot "unschedule" it, so from that moment on, it is irrelevant what was its priority.
 ''qsub **-o** LOG.stdout **-e** LOG.stderr''
@@ Line 288: / Line 292: @@
 === Ssh to random sol ===
 Ondřej Bojar suggests to add the following alias to your .bashrc (cf. [[#sshcwd]]):
-<code>alias cluster='comp=$(($RANDOM /4095 +1)); ssh -o "StrictHostKeyChecking no" sol$comp'</code>
+<code>alias cluster='comp=$(( (RANDOM % 10) +1)); ssh -o "StrictHostKeyChecking no" sol$comp'</code>
 ===== Job monitoring =====
@@ Line 294: / Line 298: @@
   * ''qstat [-u user]'' -- print a list of running/waiting jobs of a given user
   * ''qhost'' -- print available/total resources
-  * ''/SGE/REPORTER/LRC-UFAL/bin/lrc_users_real_mem_usage -u user -w'' -- current memory usage of a given user
+  * ''qacct -j job_id'' -- print info even for ended job (for which ''qstat -j job_id'' does not work). See ''man qacct'' for more.
-  * ''/SGE/REPORTER/LRC-UFAL/bin/lrc_users_limits_requested -w'' -- required resources of all users
-  * ''/SGE/REPORTER/LRC-UFAL/bin/lrc_nodes_meminfo'' -- memory usage of all nodes
+  * ''/opt/LRC/REPORTER/LRC-UFAL/bin/lrc_users_real_mem_usage -u user -w'' -- current memory usage of a given user
+  * ''/opt/LRC/REPORTER/LRC-UFAL/bin/lrc_users_limits_requested -w'' -- required resources of all users
+  * ''/opt/LRC/REPORTER/LRC-UFAL/bin/lrc_nodes_meminfo'' -- memory usage of all nodes
     * mem_total:
     * mem_free: total memory minus reserved memory (using ''qsub -l mem_free'') for each node
     * act_mem_free: really free memory
     * mem_used: really used memory
-  * ''/SGE/REPORTER/LRC-UFAL/bin/lrc_state_overview'' -- overall summary (with per-user stats for users with running jobs)
+  * ''/opt/LRC/REPORTER/LRC-UFAL/bin/lrc_state_overview'' -- overall summary (with per-user stats for users with running jobs)
-  * ''cat /SGE/REPORTER/LRC-UFAL/stats/userlist.weight'' -- all users sorted according to their activity (number of submitted jobs × their average duration), updated each night
+  * ''cat /opt/LRC/REPORTER/LRC-UFAL/stats/userlist.weight'' -- all users sorted according to their activity (number of submitted jobs × their average duration), updated each night
   * [[http://ufaladm2/munin/ufal.hide.ms.mff.cuni.cz/lrc-headnode.ufal.hide.ms.mff.cuni.cz/index.html|Munin: graph of cluster usage by day and user]] and  [[http://ufaladm2/munin/ufal.hide.ms.mff.cuni.cz/apophis.ufal.hide.ms.mff.cuni.cz/index.html|Munin monitoring of Apophis disk server]] (both accessible only from ÚFAL network)

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences