[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job breaching memory in condor version 24.0.1 not getting held



On 11/12/24 15:49, Vikrant Aggarwal wrote:
Hello Experts,

A simple python code to allocate 400G of virtual memory then touch the pages to increase RSS memory in loop, I am expecting the job should get held once it breaches the slot memory of 20GB, it gets killed as soon as it breaches the memory, switching state from running to idle then start running again.


Hi Vikrant:

In Condor version 24.0.2, this is fixed with the knob STARTER_ALWAYS_HOLD_ON_OOM. We hope to have this release out soon.


-greg