Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
grid [2017/10/02 17:08] popel |
grid [2017/10/02 17:28] popel [Job monitoring] |
===== Job monitoring ===== | ===== Job monitoring ===== |
| |
* ''qstat [-u user]'' -- seznam úloh aktuálně běžících / ve frontě | * ''qstat [-u user]'' -- print a list of running/waiting jobs of a given user |
* ''qhost'' -- dostupné zdroje | * ''qhost'' -- print available/total resources |
* ''/SGE/REPORTER/LRC-UFAL/bin/lrc_users_real_mem_usage -u user -w'' -- aktuální využití paměti uživatelem | * ''/SGE/REPORTER/LRC-UFAL/bin/lrc_users_real_mem_usage -u user -w'' -- current memory usage of a given user |
* ''/SGE/REPORTER/LRC-UFAL/bin/lrc_users_limits_requested -w'' -- nárokované požadavky uživatelů | * ''/SGE/REPORTER/LRC-UFAL/bin/lrc_users_limits_requested -w'' -- required resources of all users |
* ''/SGE/REPORTER/LRC-UFAL/bin/lrc_nodes_meminfo'' -- vypis vsech uzlu a stav vytiznosti pameti. | * ''/SGE/REPORTER/LRC-UFAL/bin/lrc_nodes_meminfo'' -- memory usage of all nodes |
* mem_total: celkova pamet uzlu | * mem_total: |
* mem_free: tedy kolik je jeste volne pameti z pametove quoty uzlu | * mem_free: total memory minus reserved memory (using ''qsub -l mem_free'') for each node |
* act_mem_free: kolik uzlu OPRAVDU zbyva volne pameti | * act_mem_free: really free memory |
* mem_used: kolik je pameti skutecne pouzito | * mem_used: really used memory |
* ''/SGE/REPORTER/LRC-UFAL/bin/lrc_state_overview'' -- celkový přehled o clusteru | * ''/SGE/REPORTER/LRC-UFAL/bin/lrc_state_overview'' -- overall summary (with per-user stats for users with running jobs) |
* celkovy pocet jader, pocet vyuzitych jader | * ''cat /SGE/REPORTER/LRC-UFAL/stats/userlist.weight'' -- all users sorted according to their activity (number of submitted jobs × their average duration), updated each night |
* celkova velikost RAM, kolik je ji fyzicky nepouzite, kolik je ji jeste nerezervovane | * [[https://ufaladm2.ufal.hide.ms.mff.cuni.cz/munin/ufal.hide.ms.mff.cuni.cz/lrc1.ufal.hide.ms.mff.cuni.cz/lrc_users.html|Munin: graph of cluster usage by day and user]] (accessible only from ÚFAL network, after accepting a security exception) |
* po jednotlivych uzivatelich (zrovna pocitajicich) -- kolik jim bezi uloh, kolik jich maji ve fronte a kolik z nich je ve stavu hold | |
* ''cat /SGE/REPORTER/LRC-UFAL/stats/userlist.weight'' -- seznam uživatelů clusteru seřazený podle dosavadní aktivity (počet odeslaných úloh × čas, který běžely), aktualizovaný každý den v noci | |
* [[https://ufaladm2.ufal.hide.ms.mff.cuni.cz/munin/ufal.hide.ms.mff.cuni.cz/lrc1.ufal.hide.ms.mff.cuni.cz/lrc_users.html|Munin: graf vytíženosti clusteru podle uživatelů]] (viditelný pouze ze sítě ÚFAL) | |
| |
=== Other === | ===== Other ===== |
* You can use environment variables ''$JOB_ID'', ''$JOB_NAME''. | * You can use environment variables ''$JOB_ID'', ''$JOB_NAME''. |
* One job can submit other jobs (but be careful with recursive:-)). A job submitted to the CPU cluster may submit GPU jobs (to the ''qpu.q'' queue). | * One job can submit other jobs (but be careful with recursive:-)). A job submitted to the CPU cluster may submit GPU jobs (to the ''qpu.q'' queue). |