[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
grid [2017/09/27 21:16]
popel [Advanced usage]
grid [2017/10/02 17:08]
popel
Line 70: Line 70:
   # We have used two handy qsub parameters:   # We have used two handy qsub parameters:
   #  -cwd  ... the script is executed in the current directory (the default is your home)   #  -cwd  ... the script is executed in the current directory (the default is your home)
-  #  -j y  ... stdout and stderr outputs are merged and redirected to a file (''script.sh.o*'')+  #  -j y  ... stdout and stderr outputs are merged and redirected to a file (''script.sh.o$JOB_ID'')
   # We have also provided two parameters for our script "Hello" and "World".   # We have also provided two parameters for our script "Hello" and "World".
   # The qsub prints something like   # The qsub prints something like
Line 148: Line 148:
 ''qsub **-p** -100'' ''qsub **-p** -100''
 Define a priority of your job as a number between -1024 and 0. Only SGE admins may use a number higher than 0. The default is 0, i.e. the highest possible priority. SGE uses the priority to decide when to start which pending job in the queue (it computes a real number called ''prior'', which is reported in ''qstat'', which grows as the job is waiting in the queue). Note that once a job is started, you cannot "unschedule" it, so from that moment on, it is irrelevant what was its priority. Define a priority of your job as a number between -1024 and 0. Only SGE admins may use a number higher than 0. The default is 0, i.e. the highest possible priority. SGE uses the priority to decide when to start which pending job in the queue (it computes a real number called ''prior'', which is reported in ''qstat'', which grows as the job is waiting in the queue). Note that once a job is started, you cannot "unschedule" it, so from that moment on, it is irrelevant what was its priority.
 +
 ''qsub **-o** LOG.stdout **-e** LOG.stderr'' ''qsub **-o** LOG.stdout **-e** LOG.stderr''
-redirect std{out,err} to separate files with given names+redirect std{out,err} to separate files with given names, instead of the defaults ''$JOB_NAME.o$JOB_ID'' and ''$JOB_NAME.e$JOB_ID''.
  
 ''qsub **-@** optionfile'' ''qsub **-@** optionfile''
Line 191: Line 192:
 You can change some properties of already submitted jobs, which are still waiting in the queue (//pending//). You can change some properties of already submitted jobs, which are still waiting in the queue (//pending//).
  
-''**man** qsub qstat qhold queue_conf sge_types complex''+''**man** qsub qstat qalter qhold queue_conf sge_types complex''
 Find out all the gory details which are missing here. You'll have to do it one day anyway:-). Find out all the gory details which are missing here. You'll have to do it one day anyway:-).
  
Line 200: Line 201:
 === qunhold === === qunhold ===
 ''~stepanek/bin/qunhold'' tries to keep the number of running SGE jobs under a given threshold: all jobs over the threshold are held. If the number of running jobs goes below the threshold (default: 100), 10 jobs (by default) are unheld. Beware: if your jobs submit new jobs, you can get far over the threshold! ''~stepanek/bin/qunhold'' tries to keep the number of running SGE jobs under a given threshold: all jobs over the threshold are held. If the number of running jobs goes below the threshold (default: 100), 10 jobs (by default) are unheld. Beware: if your jobs submit new jobs, you can get far over the threshold!
 +
 +=== sshcwd ===
 +This is useful not only when sshing to sol machines. Add the following lines to your ''~/.bashrc''.
 +
 +<code>
 +function sshcwd () {
 +  # save the current history so that it is available
 +  # immediately on the remote machine
 +  history -a;
 +  # setup the working directory by setting WD
 +  ssh -X -Y -C -t $@ "WD='$PWD' /bin/bash --login -i";
 +}
 +
 +# use WD to setup the working directory
 +if [ -n "$WD" ]; then
 +  echo "Autochanging dir to $WD" >&2
 +  cd $WD;
 +fi
 +
 +alias sol1="sshcwd sol1.ufal.hide.ms.mff.cuni.cz"
 +</code>
  
 === In-script options === === In-script options ===
Line 215: Line 237:
 === Array jobs === === Array jobs ===
  
-If you have a set of tasks (of the same type) and want to run them on multiple machines, use ''qsub -t''... +If you have a set of tasks (of the same type) and want to run them on multiple machines, use ''qsub -t''. 
-TODO+  * ''-t 1-n'' start array job with n jobs numbered 1 ... n 
 +  * environmental variable ''SGE_TASK_ID'' 
 +  * output and error files ''$JOB_NAME.[eo]$JOB_ID.$TASK_ID'' 
 +  * ''-t m-n[:s]'' start array job with jobs m, m + s, ..., n 
 +  * environmental variables ''SGE_TASK_FIRST, SGE_TASK_LAST, SGE_TASK_STEPSIZE'' 
 +  * ''-tc j'' run at most j jobs simultaneously 
 +  * ''-hold_jid_ad comma_separated_job_list'' array jobs that must finish before this job starts; task //i// of the current job depends only on task //i// of the specified jobs
  
-===== Monitorování úloh =====+===== Job monitoring =====
  
   * ''qstat [-u user]'' -- seznam úloh aktuálně běžících / ve frontě   * ''qstat [-u user]'' -- seznam úloh aktuálně běžících / ve frontě
Line 236: Line 264:
   * [[https://ufaladm2.ufal.hide.ms.mff.cuni.cz/munin/ufal.hide.ms.mff.cuni.cz/lrc1.ufal.hide.ms.mff.cuni.cz/lrc_users.html|Munin: graf vytíženosti clusteru podle uživatelů]] (viditelný pouze ze sítě ÚFAL)   * [[https://ufaladm2.ufal.hide.ms.mff.cuni.cz/munin/ufal.hide.ms.mff.cuni.cz/lrc1.ufal.hide.ms.mff.cuni.cz/lrc_users.html|Munin: graf vytíženosti clusteru podle uživatelů]] (viditelný pouze ze sítě ÚFAL)
  
-===== Časté a záludné problémy ===== +=== Other === 
- +  * You can use environment variables ''$JOB_ID'', ''$JOB_NAME''
- +  One job can submit other jobs (but be careful with recursive:-)). A job submitted to the CPU cluster may submit GPU jobs (to the ''qpu.q'' queue). 
-==== Submitnutý job může znovu submitovat ==== +  * It is important, that the files that are sourced during a login such as .bash_profile.profile.bashrc.login etc. don't produce any output when a non-interactive login is done. If they do, changes are that your job will run, but that the batch system is unable to deliver to you the standard output and error files. In that case the status of your job will be 'E' after the job is finished. Here is an example how you can test in a .bash_profile or .bashrc if this is an interactive login:
- +
-Danovy starší zkušenosti s clusterem PBS (nikoli SGE) říkaly, že tohle nejde. Ale jde to, aspoň u nás. Příkazy ''qsub'' a spol. jsou kromě hlavy clusteru dostupné i na všech strojích clusterusamozřejmě pokud váš soubor ''.bashrc'', ''.cshrc'' apod. zajistí, že se i na nich provede inicializace prostředí SGE. +
- +
- +
- +
-==== Proměnné prostředí, nastavení vlastního prostředí ==== +
- +
-SGE spouští skripty v čistém prostředí. Nebuďte proto překvapeni, když vám skript na konzoli poběží dobře, ale po submitnutí fungovat nebude. Třeba nenašel potřebné programy v ''$PATH'' +
- +
-Zatím nevím přesně, které ze souborů ''.login'', ''.bashrc'' ap. SGE spouští, jestli vůbec nějaké. Naopak, experimentálně jsem ověřil, že ''qsub -S /bin/bash skript'' nenačte žádný z ''.bashrc'', ''.bash_profile'', ''.login'', ani ''.profile''+
- +
-Z toho například také vyplývá, že bez ošetření se jako **Java** používá +
- +
-   java version "1.5.0" +
-   gij (GNU libgcjversion 4.1.2 20070502 (Red Hat 4.1.2-12) +
- +
-Pokud chcete submittovaný program pouštět ve svém oblíbeném prostředí (např. nastavení ''PATH''), musíte v obalujícím skriptu příslušné ''.bash*'' načíst. Vždy je ale bezpečnější všude psát plné cesty, než spoléhat na PATH. +
- +
-==== bashrc a podobné nesmí nic vypisovat na konzoli ==== +
- +
-Opsáno z [[http://www.sara.nl/userinfo/lisa/usage/batch/index.html]]. +
- +
-It is important, that the files that are sourced during a login such as .bash_profile .profile .bashrc .login .cshrc don't produce any output when a non-interactive login is done. If they do, changes are that your job will run, but that the batch system is unable to deliver to you the standard output and error files. In that case the status of your job will be 'E' after the job is finished. Here is an example how you can test in a .bash_profile or .bashrc if this is an interactive login: +
 <code> <code>
 unset INTERACTIVE unset INTERACTIVE
Line 273: Line 277:
 fi fi
 </code> </code>
- +TODOIs this restriction still true (for our cluster)? E.g. .bash_profile with /net/projects/SGE/user/sge_profile prints info messages on stderr and it is OK.
-==== Jak zjistit, jaké zdroje jsem pro svou úlohu požadoval ==== +
- +
-<code>qstat -j 973884,982737,984029,984030,984031,984034,984036 | grep resource +
-hard resource_list        mem_free=50g +
-hard resource_list:         mem_free=200g +
-hard resource_list:         mem_free=16g +
-hard resource_list:         mem_free=16g +
-hard resource_list:         mem_free=16g +
-hard resource_list:         mem_free=31g</code>+
  

[ Back to the navigation ] [ Back to the content ]