I know periodicremove condition can be set in job definition, I want to know what is the expected behavior when job breaches the allocated memory with following settingsÂ
BASE_CGROUP = htcondor
CGROUP_IGNORE_CACHE_MEMORY = true
CGROUP_MEMORY_LIMIT_POLICY = hard
Earlier we used the following setting on Centos 7 and Rocky 8 machines to put the job breaching memory into held status. Now on Rocky9 with condor 24.0.1 this setting doesn't make a difference.Â
IGNORE_LEAF_OOM = False
If a job is not getting held, at-least it should get removed from the queue.Â
I see the following in cgroup output as soon as job breaches memory
low 0
high 0
max 18
oom 1
oom_kill 5. <<<<<Â
Thanks & Regards,
Vikrant Aggarwal