Hey all,
This is just a thought for the SLURMCluster for now (since that's what I'm familiar with) but similar options may be available in other clusters too. Currently, the cancel_command in the SLURMJob class is a bare "scancel".
|
cancel_command = "scancel" |
This means that, even when workers are shutdown completely gracefully, the Slurm job is marked as CANCELLED. Instead, if the command were scancel --signal=SIGTERM the job would be marked as COMPLETED. Its possible there could be cases where we would want a job to cancelled, which complicates this somewhat.
In the simple case, however, I think this could be implmented with a simple change of cancel_command to:
class SLURMJob(Job):
# Override class variables
submit_command = "sbatch"
cancel_command = "scancel --signal=SIGTERM"
config_name = "slurm"
It'd be great to get some more thoughts on the implications for this.
Hey all,
This is just a thought for the
SLURMClusterfor now (since that's what I'm familiar with) but similar options may be available in other clusters too. Currently, thecancel_commandin theSLURMJobclass is a bare"scancel".dask-jobqueue/dask_jobqueue/slurm.py
Line 15 in 8713202
This means that, even when workers are shutdown completely gracefully, the Slurm job is marked as
CANCELLED. Instead, if the command werescancel --signal=SIGTERMthe job would be marked asCOMPLETED. Its possible there could be cases where we would want a job to cancelled, which complicates this somewhat.In the simple case, however, I think this could be implmented with a simple change of
cancel_commandto:It'd be great to get some more thoughts on the implications for this.