Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
grid [2017/09/27 21:16] popel [Advanced usage] |
grid [2017/10/02 17:28] popel [Job monitoring] |
||
---|---|---|---|
Line 70: | Line 70: | ||
# We have used two handy qsub parameters: | # We have used two handy qsub parameters: | ||
# -cwd ... the script is executed in the current directory (the default is your home) | # -cwd ... the script is executed in the current directory (the default is your home) | ||
- | # -j y ... stdout and stderr outputs are merged and redirected to a file ('' | + | # -j y ... stdout and stderr outputs are merged and redirected to a file ('' |
# We have also provided two parameters for our script " | # We have also provided two parameters for our script " | ||
# The qsub prints something like | # The qsub prints something like | ||
Line 148: | Line 148: | ||
'' | '' | ||
Define a priority of your job as a number between -1024 and 0. Only SGE admins may use a number higher than 0. The default is 0, i.e. the highest possible priority. SGE uses the priority to decide when to start which pending job in the queue (it computes a real number called '' | Define a priority of your job as a number between -1024 and 0. Only SGE admins may use a number higher than 0. The default is 0, i.e. the highest possible priority. SGE uses the priority to decide when to start which pending job in the queue (it computes a real number called '' | ||
+ | |||
'' | '' | ||
- | redirect std{out, | + | redirect std{out, |
'' | '' | ||
Line 191: | Line 192: | ||
You can change some properties of already submitted jobs, which are still waiting in the queue (// | You can change some properties of already submitted jobs, which are still waiting in the queue (// | ||
- | '' | + | '' |
Find out all the gory details which are missing here. You'll have to do it one day anyway:-). | Find out all the gory details which are missing here. You'll have to do it one day anyway:-). | ||
Line 200: | Line 201: | ||
=== qunhold === | === qunhold === | ||
'' | '' | ||
+ | |||
+ | === sshcwd === | ||
+ | This is useful not only when sshing to sol machines. Add the following lines to your '' | ||
+ | |||
+ | < | ||
+ | function sshcwd () { | ||
+ | # save the current history so that it is available | ||
+ | # immediately on the remote machine | ||
+ | history -a; | ||
+ | # setup the working directory by setting WD | ||
+ | ssh -X -Y -C -t $@ " | ||
+ | } | ||
+ | |||
+ | # use WD to setup the working directory | ||
+ | if [ -n " | ||
+ | echo " | ||
+ | cd $WD; | ||
+ | fi | ||
+ | |||
+ | alias sol1=" | ||
+ | </ | ||
=== In-script options === | === In-script options === | ||
Line 215: | Line 237: | ||
=== Array jobs === | === Array jobs === | ||
- | If you have a set of tasks (of the same type) and want to run them on multiple machines, use '' | + | If you have a set of tasks (of the same type) and want to run them on multiple machines, use '' |
- | TODO | + | * '' |
+ | * environmental variable '' | ||
+ | * output and error files '' | ||
+ | * '' | ||
+ | * environmental variables '' | ||
+ | * '' | ||
+ | * '' | ||
- | ===== Monitorování úloh ===== | + | ===== Job monitoring |
- | * '' | + | * '' |
- | * '' | + | * '' |
- | * ''/ | + | * ''/ |
- | * ''/ | + | * ''/ |
- | * ''/ | + | * ''/ |
- | * mem_total: | + | * mem_total: |
- | * mem_free: | + | * mem_free: |
- | * act_mem_free: | + | * act_mem_free: |
- | * mem_used: | + | * mem_used: |
- | * ''/ | + | * ''/ |
- | * celkovy pocet jader, pocet vyuzitych jader | + | * '' |
- | * celkova velikost RAM, kolik je ji fyzicky nepouzite, kolik je ji jeste nerezervovane | + | * [[https:// |
- | * po jednotlivych uzivatelich | + | |
- | * '' | + | |
- | * [[https:// | + | |
- | + | ||
- | ===== Časté a záludné problémy ===== | + | |
- | + | ||
- | + | ||
- | ==== Submitnutý job může znovu submitovat ==== | + | |
- | + | ||
- | Danovy starší zkušenosti s clusterem PBS (nikoli SGE) říkaly, že tohle nejde. Ale jde to, aspoň u nás. Příkazy '' | + | |
- | + | ||
- | + | ||
- | + | ||
- | ==== Proměnné prostředí, | + | |
- | + | ||
- | SGE spouští skripty v čistém prostředí. Nebuďte proto překvapeni, | + | |
- | + | ||
- | Zatím nevím přesně, které ze souborů '' | + | |
- | + | ||
- | Z toho například také vyplývá, že bez ošetření se jako **Java** používá | + | |
- | + | ||
- | java version " | + | |
- | gij (GNU libgcj) version 4.1.2 20070502 (Red Hat 4.1.2-12) | + | |
- | + | ||
- | Pokud chcete submittovaný program pouštět ve svém oblíbeném prostředí (např. nastavení '' | + | |
- | + | ||
- | ==== bashrc a podobné nesmí nic vypisovat na konzoli ==== | + | |
- | + | ||
- | Opsáno z [[http:// | + | |
- | + | ||
- | It is important, that the files that are sourced during a login such as .bash_profile .profile .bashrc .login .cshrc don't produce any output when a non-interactive login is done. If they do, changes are that your job will run, but that the batch system is unable to deliver to you the standard output and error files. In that case the status of your job will be ' | + | |
+ | ===== Other ===== | ||
+ | * You can use environment variables '' | ||
+ | * One job can submit other jobs (but be careful with recursive: | ||
+ | * It is important, that the files that are sourced during a login such as .bash_profile, | ||
< | < | ||
unset INTERACTIVE | unset INTERACTIVE | ||
Line 273: | Line 274: | ||
fi | fi | ||
</ | </ | ||
- | + | TODO: Is this restriction still true (for our cluster)? E.g. .bash_profile with / | |
- | ==== Jak zjistit, jaké zdroje jsem pro svou úlohu požadoval ==== | + | |
- | + | ||
- | < | + | |
- | hard resource_list: mem_free=50g | + | |
- | hard resource_list: | + | |
- | hard resource_list: | + | |
- | hard resource_list: | + | |
- | hard resource_list: | + | |
- | hard resource_list: | + | |