Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
grid [2018/06/26 16:21] vodrazka [MS = Malá Strana (cpu-ms.q)] |
grid [2018/07/15 21:18] popel [Job monitoring] |
=== Ssh to random sol === | === Ssh to random sol === |
Ondřej Bojar suggests to add the following alias to your .bashrc (cf. [[#sshcwd]]): | Ondřej Bojar suggests to add the following alias to your .bashrc (cf. [[#sshcwd]]): |
<code>alias cluster='comp=$(($RANDOM /4095 +1)); ssh -o "StrictHostKeyChecking no" sol$comp'</code> | <code>alias cluster='comp=$(( (RANDOM % 10) +1)); ssh -o "StrictHostKeyChecking no" sol$comp'</code> |
| |
===== Job monitoring ===== | ===== Job monitoring ===== |
* ''qstat [-u user]'' -- print a list of running/waiting jobs of a given user | * ''qstat [-u user]'' -- print a list of running/waiting jobs of a given user |
* ''qhost'' -- print available/total resources | * ''qhost'' -- print available/total resources |
* ''/SGE/REPORTER/LRC-UFAL/bin/lrc_users_real_mem_usage -u user -w'' -- current memory usage of a given user | * ''qacct -j job_id'' -- print info even for ended job (for which ''qstat -j job_id'' does not work). See ''man qacct'' for more. |
* ''/SGE/REPORTER/LRC-UFAL/bin/lrc_users_limits_requested -w'' -- required resources of all users | |
* ''/SGE/REPORTER/LRC-UFAL/bin/lrc_nodes_meminfo'' -- memory usage of all nodes | * ''/opt/LRC/REPORTER/LRC-UFAL/bin/lrc_users_real_mem_usage -u user -w'' -- current memory usage of a given user |
| * ''/opt/LRC/REPORTER/LRC-UFAL/bin/lrc_users_limits_requested -w'' -- required resources of all users |
| * ''/opt/LRC/REPORTER/LRC-UFAL/bin/lrc_nodes_meminfo'' -- memory usage of all nodes |
* mem_total: | * mem_total: |
* mem_free: total memory minus reserved memory (using ''qsub -l mem_free'') for each node | * mem_free: total memory minus reserved memory (using ''qsub -l mem_free'') for each node |
* act_mem_free: really free memory | * act_mem_free: really free memory |
* mem_used: really used memory | * mem_used: really used memory |
* ''/SGE/REPORTER/LRC-UFAL/bin/lrc_state_overview'' -- overall summary (with per-user stats for users with running jobs) | * ''/opt/LRC/REPORTER/LRC-UFAL/bin/lrc_state_overview'' -- overall summary (with per-user stats for users with running jobs) |
* ''cat /SGE/REPORTER/LRC-UFAL/stats/userlist.weight'' -- all users sorted according to their activity (number of submitted jobs × their average duration), updated each night | * ''cat /opt/LRC/REPORTER/LRC-UFAL/stats/userlist.weight'' -- all users sorted according to their activity (number of submitted jobs × their average duration), updated each night |
* [[http://ufaladm2/munin/ufal.hide.ms.mff.cuni.cz/lrc-headnode.ufal.hide.ms.mff.cuni.cz/index.html|Munin: graph of cluster usage by day and user]] and [[http://ufaladm2/munin/ufal.hide.ms.mff.cuni.cz/apophis.ufal.hide.ms.mff.cuni.cz/index.html|Munin monitoring of Apophis disk server]] (both accessible only from ÚFAL network) | * [[http://ufaladm2/munin/ufal.hide.ms.mff.cuni.cz/lrc-headnode.ufal.hide.ms.mff.cuni.cz/index.html|Munin: graph of cluster usage by day and user]] and [[http://ufaladm2/munin/ufal.hide.ms.mff.cuni.cz/apophis.ufal.hide.ms.mff.cuni.cz/index.html|Munin monitoring of Apophis disk server]] (both accessible only from ÚFAL network) |
| |