Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Memory accounting issue with cgroups
- Date: Mon, 29 May 2023 18:17:35 +0000
- From: Marco van Zwetselaar <zwets@xxxxxxxxxx>
- Subject: Re: [HTCondor-users] Memory accounting issue with cgroups
Thanks Greg,
I've been experimenting a bit, using information from
https://facebookmicrosites.github.io/cgroup2/docs/overview.html.
Relevant quotes there:
- memory.high is the memory usage throttle limit. This is the
main mechanism to control a cgroupâs memory use. If a cgroup's
memory use goes over the high boundary specified here, the
cgroupâs processes are throttled and put under heavy reclaim
pressure. The default is max, meaning there is no limit.
- memory.max is the memory usage hard limit, acting as the final
protection mechanism: If a cgroup's memory usage reaches this
limit and can't be reduced, the system OOM killer is invoked on
the cgroup.Â
- Under certain circumstances, usage may go over the memory.high
limit temporarily. When the high limit is used and monitored
properly, memory.max serves mainly to provide the final safety
net. The default is max.
The 10.6 "memory.max" setting worked for one job (keeping its RSS
within request_mem), but for another type of job it almost
immediately OOM-ed all instances. This was correct (in principle)
from the condor viewpoint: the instances neatly went on Hold. (The
only minor inconvenience being that the system sends an email for
every OOM.)
However, when I manually set memory.high at 50% of memory.max, the
offending jobs crept up to about 110% of that level and kept
running. The memory.pressure (see doc here:
https://facebookmicrosites.github.io/cgroup2/docs/pressure-metrics.html)
then slowly went up to 98.5%, supposedly meaning that the job was
spending 98.5% of its time stalling for memory pages to be swapped
in.
This is on a machine with no disc swap, and plenty spare memory
above memory.max (requested 256G out of 768G), so I suppose no
actual swapping was taking place, just physical pages being marked
"free" and "taken" again. (Or something like that, I'm no expert in
Linux kernel memory management.)
Increasing memory.high to 90% of memory.max made consumption creep
up again, levelling just below memory.max, with no OOMs. Reducing
memory.high also worked, and consumption would go down again. Very
neat.
Unfortunately, I didn't try lifting memory.high while the job was at
memory.max, to see if memory.max would pressure the job before
OOM-ing it (provided that it approached memory.max slowly). Its
description (above) appears to suggest this: "if [memory
consumption] reaches this limit **and can't be reduced** [then OOM
ensues]".
I'm in the dark what "and can't be reduced" means. The OOM came
almost immediately after starting the job, whereas with memory.high
set at 90% of max, the job ran to completion.
Either way, it would seem that setting memory.high at ~90% of
memory.max would be appropriate. I haven't yet thought about
memory.min/low.
Cheers
Marco
On 20/05/2023 23:43, Greg Thain via
HTCondor-users wrote:
On 5/20/23 5:03 AM, Marco van Zwetselaar wrote:
I guess my mental picture of memory.high as a yellow card, and
memory.max as the red card was incorrect. It's more like rugby:
the referee's stare is enough. :-)
The kernel docs are a little vague about the difference between
"high" and "max", saying that usually a cgroup gets OOM killed
when it hits "high", but in some cases can go all the way up to
"max" before the OOM arrives. It isn't clear to me if this means
maybe a page or two more memory, in order to deliver the signal,
or potentially some unbounded amount of memory. Given that, I
chose to have condor only set "max".
If you will excuse me stretching your metaphor, "high" is the
moment the red card goes into the air, but "max" is when the
guilty party actually leaves the pitch. "memory.min" is like our
youth leagues here, where there is an unwritten understanding that
if one team can't field some minimum number of players (seven?),
the opposing team (if able) will loan them some players in order
that the kids can still get a game in (despite a forfeit on the
books). And I have no good idea right now what htcondor should
set "memory.low" to.
On a side note to the Condor devs: my config has
'DISABLE_SWAP_FOR_JOB = true'. Shouldn't that translate to
'memory.swap.max = 0' on the cgroup (currently shows "max")?
The cgroup v2 code path doesn't set this. I'll write a PR to fix
this.
Thanks,
-greg
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/