Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Limiting memory used on the worker node with c-groups
On 4/30/20 6:19 AM, tpdownes@xxxxxxxxx wrote:
I do think your problem is as simple as Thomas' question: figuring out
why oom_control is set to disabled. These cgroup settings are inherited
hierarchically so it could be the htcondor group itself or a cgroup
above it. It could even be set system-wide.
Hello Tom,
It appears that oom_control is enabled at the top level of the
"memory" c-group as well as in "htcondor below" but becomes disabled
at the slot level :
cat /sys/fs/cgroup/memory/htcondor/memory.oom_control
oom_kill_disable 0
under_oom 0
cat
/sys/fs/cgroup/memory/htcondor/condor_dlocal_htcondor_slot1@xxxxxxxxxxxxxxxxx/memory.oom_control
oom_kill_disable 1
under_oom 0
I do not know why...
The defined behavior is:
When the OOM killer is disabled, tasks that attempt to use more
memory than they are allowed are paused until additional memory is
freed.
So the paused processes would correspond to those processes
in "D" state ? On machines with processes in D state, some
htcondor slots have under_oom set to 1 which seem consistent.
In real-world situations, most jobs can sneak above their memory limit
and it's not a big deal because other jobs are below their limit. Why
make it a big deal?
I started to look at this with the aim of preventing a whole worker
node to become "hanged" with memory exhaustion with the only way to
reboot it being to manually power-cycle it (which we cannot do in
the current lockdown, so we currently have worker nodes unavailable).
I also looked at SYSTEM_PERIODIC_REMOVE on the submitter but I learned
that it is slow to react while a pathological job could harm a worker
node before the job gets removed...
In fact I am not too unhappy with CGROUP_MEMORY_LIMIT_POLICY = hard
that I am testing on a single worker node but I may not have seen yet
all the drawbacks...
I have not yet decided what setting to choose. Having processes paused
is painful, the node does not recover by itself. Still it is better
than loosing it totally.
Thank you very much for your advice.
JM
--
------------------------------------------------------------------------
Jean-michel BARBET | Tel: +33 (0)2 51 85 84 86
Laboratoire SUBATECH Nantes France | Fax: +33 (0)2 51 85 84 79
CNRS-IN2P3/Ecole des Mines/Universite | E-Mail: barbet@xxxxxxxxxxxxxxxxx
------------------------------------------------------------------------