Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] How to access the number of times a job has been put on hold and released
- Date: Tue, 21 Aug 2018 01:41:29 +0000
- From: Greg.Hitchen@xxxxxxxx
- Subject: [HTCondor-users] How to access the number of times a job has been put on hold and released
Is it possible to determine the number of times a job has been put on hold and then released?
There doesn't seem to be a direct job classad that shows this.
We quite often have user submit files that put a job on hold if it is running for more than a specified time.
This is mainly to combat jobs that seem to fall into a black hole on execute nodes occasionally.
A period_release than allows them to try running somewhere else, hopefully successfully this time.
We have a user who would like to remove a job if it has benn put on hold and released more than X number of times.
e.g. using imaginary job classad NumHoldsReleases then we could change:
on_exit_remove = (ExitCode == 0) && (ExitBySignal == False)
to
on_exit_remove = ((ExitCode == 0) && (ExitBySignal == False)) || (NumHoldsReleases > 5)
Is there a way to achieve this?
Thanks.
Cheers
Greg