Hi all, while the docs hint at the defaults doing what we want, I am a bit confused over what to do to *definitely* get our desired state. - My desired situation is for a job I submit to never be rerun after it has started once. So far, I have used max_retries = 0 and it works in most cases. However, it checks for NumJobCompletions > JobMaxRetries || <other stuff> which does not take Condor/Host/... crashes and the like into account. While this does not happen relatively often, we run enough jobs for it to get noticeable. Now, I am considering setting OnExitRemove = True but am not sure whether this also does not trigger for non-job failure. The condor_submit docs hint that this is what we want. I realise this can be also be achieved using Requirements+PeriodicRemove [1], but am unsure about which combinations may have race conditions - e.g. that NumJobMatches would be incremented before the startd checks Requirements before the job actually starts. Thanks for any help and input! Cheers, Max [1] https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToAvoidJobRestarts
Attachment:
smime.p7s
Description: S/MIME cryptographic signature