[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Avoiding rerun of jobs



Hi all,

while the docs hint at the defaults doing what we want, I am a bit confused over what to do to *definitely* get our desired state.

- My desired situation is for a job I submit to never be rerun after it has started once.

So far, I have used
	max_retries = 0
and it works in most cases. However, it checks for
	NumJobCompletions > JobMaxRetries || <other stuff>
which does not take Condor/Host/... crashes and the like into account.
While this does not happen relatively often, we run enough jobs for it to get noticeable.

Now, I am considering setting
	OnExitRemove = True
but am not sure whether this also does not trigger for non-job failure. The condor_submit docs hint that this is what we want.

I realise this can be also be achieved using Requirements+PeriodicRemove [1], but am unsure about which combinations may have race conditions - e.g. that NumJobMatches would be incremented before the startd checks Requirements before the job actually starts.

Thanks for any help and input!

Cheers,
Max

[1]
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToAvoidJobRestarts

Attachment: smime.p7s
Description: S/MIME cryptographic signature