Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
slurm [2023/04/13 17:01] dusek Priority |
slurm [2023/09/26 17:09] straka |
</code> | </code> |
| |
==== Running jobs ==== | ==== Inspecting jobs ==== |
| |
In order to inspect all running jobs on the cluster use: | In order to inspect all running jobs on the cluster use: |
=== Priority ==== | === Priority ==== |
| |
When running srun or sbatch, you can pass `-q high/normal/low/preempt-low`. These represent priorities 300/200/100/100, with `normal` (200) being the default. Furthermore, the `preempt-low` QOS is actually preemptible -- if there is a job with normal or high QOS, they can interrupt your `preempt-low` job. | When running srun or sbatch, you can pass ''-q high/normal/low/preempt-low''. These represent priorities 300/200/100/100, with ''normal'' (200) being the default. Furthermore, the ''preempt-low'' QOS is actually preemptible -- if there is a job with normal or high QOS, they can interrupt your ''preempt-low'' job. |
| |
The preemption has probably not been used by anyone yet; some documentation about it is on https://slurm.schedmd.com/preempt.html, we use the REQUEUE regime (so your job is killed, very likely with some signal, so you could monitor it and for example save a checkpoint; but currently I do not know any details), and then started again when there are resources. | The preemption has probably not been used by anyone yet; some documentation about it is on https://slurm.schedmd.com/preempt.html, we use the REQUEUE regime (so your job is killed, very likely with some signal, so you could monitor it and for example save a checkpoint; but currently I do not know any details), and then started again when there are resources. |
<code>srun -p gpu-troja --constraint="gpuram48G|gpuram40G" --mem=64G --gres=gpu:2 --pty bash</code> | <code>srun -p gpu-troja --constraint="gpuram48G|gpuram40G" --mem=64G --gres=gpu:2 --pty bash</code> |
* ''-''''-constraint="gpuram48G|gpuram40G"'' only consider nodes that have either ''gpuram48G'' or ''gpuram40G'' feature defined | * ''-''''-constraint="gpuram48G|gpuram40G"'' only consider nodes that have either ''gpuram48G'' or ''gpuram40G'' feature defined |
| |
| ==== ==== |
| |
==== Delete Job ==== | ==== Delete Job ==== |
<code>scancel <job_id> </code> | <code>scancel <job_id> </code> |
| |
| <code>scancel -n <job_name> </code> |
| |
| |
To see all the available options type: | To see all the available options type: |
| |
<code>man srun</code> | <code>man scancel</code> |
| |
==== Basic commands on cluster machines ==== | ==== Basic commands on cluster machines ==== |