Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
grid [2017/09/27 20:45] popel [Rules] |
grid [2017/10/02 17:08] popel |
||
---|---|---|---|
Line 70: | Line 70: | ||
# We have used two handy qsub parameters: | # We have used two handy qsub parameters: | ||
# -cwd ... the script is executed in the current directory (the default is your home) | # -cwd ... the script is executed in the current directory (the default is your home) | ||
- | # -j y ... stdout and stderr outputs are merged and redirected to a file ('' | + | # -j y ... stdout and stderr outputs are merged and redirected to a file ('' |
# We have also provided two parameters for our script " | # We have also provided two parameters for our script " | ||
# The qsub prints something like | # The qsub prints something like | ||
Line 135: | Line 135: | ||
===== Advanced usage ===== | ===== Advanced usage ===== | ||
+ | |||
+ | '' | ||
+ | This way your job is submitted to the Troja queue. The default is '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | However, usually you should specify just the queue (troja-all.q vs. ms-all.q), not a particular machine, and instead use '' | ||
+ | |||
+ | '' | ||
+ | See '' | ||
'' | '' | ||
Define a priority of your job as a number between -1024 and 0. Only SGE admins may use a number higher than 0. The default is 0, i.e. the highest possible priority. SGE uses the priority to decide when to start which pending job in the queue (it computes a real number called '' | Define a priority of your job as a number between -1024 and 0. Only SGE admins may use a number higher than 0. The default is 0, i.e. the highest possible priority. SGE uses the priority to decide when to start which pending job in the queue (it computes a real number called '' | ||
+ | |||
'' | '' | ||
- | redirect std{out, | + | redirect std{out, |
'' | '' | ||
Line 174: | Line 185: | ||
'' | '' | ||
By default, all the resource requirements (specified with '' | By default, all the resource requirements (specified with '' | ||
+ | |||
+ | '' | ||
+ | This causes qsub to wait for the job to complete before exiting (with the same exit code as the job). Useful in scripts. | ||
'' | '' | ||
You can change some properties of already submitted jobs, which are still waiting in the queue (// | You can change some properties of already submitted jobs, which are still waiting in the queue (// | ||
- | '' | + | '' |
Find out all the gory details which are missing here. You'll have to do it one day anyway:-). | Find out all the gory details which are missing here. You'll have to do it one day anyway:-). | ||
Line 187: | Line 201: | ||
=== qunhold === | === qunhold === | ||
'' | '' | ||
+ | |||
+ | === sshcwd === | ||
+ | This is useful not only when sshing to sol machines. Add the following lines to your '' | ||
+ | |||
+ | < | ||
+ | function sshcwd () { | ||
+ | # save the current history so that it is available | ||
+ | # immediately on the remote machine | ||
+ | history -a; | ||
+ | # setup the working directory by setting WD | ||
+ | ssh -X -Y -C -t $@ " | ||
+ | } | ||
+ | |||
+ | # use WD to setup the working directory | ||
+ | if [ -n " | ||
+ | echo " | ||
+ | cd $WD; | ||
+ | fi | ||
+ | |||
+ | alias sol1=" | ||
+ | </ | ||
=== In-script options === | === In-script options === | ||
Line 202: | Line 237: | ||
=== Array jobs === | === Array jobs === | ||
- | If you have a set of tasks (of the same type) and want to run them on multiple machines, | + | If you have a set of tasks (of the same type) and want to run them on multiple machines, |
- | TODO | + | * '' |
- | ===== Monitorování úloh ===== | + | * environmental variable '' |
+ | * output and error files '' | ||
+ | * '' | ||
+ | * environmental variables '' | ||
+ | * '' | ||
+ | * '' | ||
+ | |||
+ | ===== Job monitoring | ||
* '' | * '' | ||
Line 222: | Line 264: | ||
* [[https:// | * [[https:// | ||
- | ===== Časté a záludné problémy ===== | + | === Other === |
- | + | * You can use environment variables | |
- | + | * One job can submit other jobs (but be careful with recursive:-)). A job submitted | |
- | ==== Submitnutý job může znovu submitovat ==== | + | |
- | + | ||
- | Danovy starší zkušenosti s clusterem PBS (nikoli SGE) říkaly, že tohle nejde. Ale jde to, aspoň u nás. Příkazy | + | |
- | + | ||
- | + | ||
- | + | ||
- | ==== Proměnné prostředí, | + | |
- | + | ||
- | SGE spouští skripty v čistém prostředí. Nebuďte proto překvapeni, | + | |
- | + | ||
- | Zatím nevím přesně, které ze souborů '' | + | |
- | + | ||
- | Z toho například také vyplývá, že bez ošetření se jako **Java** používá | + | |
- | + | ||
- | java version " | + | |
- | | + | |
- | + | ||
- | Pokud chcete submittovaný program pouštět ve svém oblíbeném prostředí (např. nastavení '' | + | |
- | + | ||
- | ==== Jiný shell ==== | + | |
- | + | ||
- | Abych mohl poslat nějakou úlohu do fronty, musím pro ni vyrobit vlastní skript. Budiž, vyrobil jsem vlastní skript: | + | |
- | + | ||
- | < | + | |
- | # | + | |
- | program > log.out 2> log.err | + | |
- | </ | + | |
- | + | ||
- | Když tento skript spustím, stane se očekávané. Přesměrují se výstupy z daného programu do souborů a je to. | + | |
- | + | ||
- | Když takový skript submitnu, program se **nespustí**. V logu zjistím, že (standardní chybový) výstup shellu, který pouštěl můj skript praví kryptickou zprávu " | + | |
- | + | ||
- | Nebudu vás napínat, zde je vysvětlení: | + | |
- | + | ||
- | Takto SGE přinutíte, | + | |
- | + | ||
- | < | + | |
- | qsub -S /bin/bash skript | + | |
- | </ | + | |
- | + | ||
- | Jinou možností je přesměrovat stderr a stdout pomocí syntaxe csh: | + | |
- | + | ||
- | < | + | |
- | ( command > | + | |
- | </ | + | |
- | + | ||
- | + | ||
- | ==== bashrc a podobné nesmí nic vypisovat na konzoli ==== | + | |
- | + | ||
- | Opsáno z [[http:// | + | |
- | + | ||
- | It is important, that the files that are sourced during a login such as .bash_profile .profile .bashrc .login .cshrc don't produce any output when a non-interactive login is done. If they do, changes are that your job will run, but that the batch system is unable to deliver to you the standard output and error files. In that case the status of your job will be ' | + | |
< | < | ||
unset INTERACTIVE | unset INTERACTIVE | ||
Line 287: | Line 277: | ||
fi | fi | ||
</ | </ | ||
- | + | TODO: Is this restriction still true (for our cluster)? E.g. .bash_profile with / | |
- | ==== Jak zjistit, jaké zdroje jsem pro svou úlohu požadoval ==== | + | |
- | + | ||
- | < | + | |
- | hard resource_list: mem_free=50g | + | |
- | hard resource_list: | + | |
- | hard resource_list: | + | |
- | hard resource_list: | + | |
- | hard resource_list: | + | |
- | hard resource_list: | + | |
- | + | ||
- | ==== Jak rezervovat více jader na stejném stroji pro 1 job ==== | + | |
- | + | ||
- | < | + | |
- | qsub -pe smp <pocet jader> | + | |
- | </code> | + | |