[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Job in held status without any HoldReasonCode



HelloÂExperts,

I saw one of job in held status and from executor partitionable slot logs I found that it was put in held status because of OOM event, we have conditions to automatically release the job based on holdreasoncode but in this case job was not having any such code associated with it. I am planning to introduce new condition ofÂÂisundefined(HoldReasonCode) in submit file to take care of this scenario. Just curious to know whether this is a bug or a expected scenario?

08/02/19 01:25:43 (pid:988203) JobExit() failed, waiting for job lease to expire or for a reconnect attempt
08/02/19 01:25:43 (pid:988203) Returning from CStarter::JobReaper()
08/02/19 01:25:43 (pid:988203) Job was held due to OOM event: Job has encountered an out-of-memory event.
08/02/19 01:25:43 (pid:988203) Got SIGQUIT. Performing fast shutdown.

Thanks & Regards,
Vikrant Aggarwal