Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Memory enforcement with cgroups v2 on EL9
- Date: Fri, 16 Aug 2024 11:51:52 +0200
- From: Matthias Schnepf <matthias.schnepf@xxxxxxx>
- Subject: Re: [HTCondor-users] Memory enforcement with cgroups v2 on EL9
Hi Tim,
I tested HTCondor version 23.9.6 with CGROUP_IGNORE_CACHE_MEMORY = TRUE
on some RHEL 8, yes 8, not 9, WNs with cgroup v2.
I am not that familiar with memory management in Linux, but I try to
share my experience and opinions.
The memory usage shown in HTCondor is now lower and does not include
memory cache. However, it confuses me since the memory consumption
without inactive_file and inactive_anon from the memory.stat file is
lower than RES (resident size) in htop. The resident size is the
memory.current value minus the file cache (inactive_file from the
memory.stat file). That should be similar to what was used in memory
usage with cgroup v1.
E.g. one job had:
memory.current 2033MB
inactive_anon 1809MB
inactive_fileÂÂÂÂ 172MB
htop (RES) ~ 1800MB
ÂI played a bit with some memory limits. Programs work with a bit more
memory than memory.current (at peak) -inactive_file - inactive_anon. The
memory usage with CGROUP_IGNORE_CACHE_MEMORY = TRUE is valid but
unusual. I think it would be useful to have the memory usage from
RES/memory.current - inactive_file, since that is what most users are
familiar and can compare with local runs via htop or similar programs.
Sorry for asking about an additional option :-)
Best regards,
Matthias
On 8/8/24 10:09 PM, Tim Theisen via HTCondor-users wrote:
The recent HTCondor 23.9.6 release has code to not count kernel cache
memory in the jobs memory usage. This code was originally slated for
the 23.10 release. However, since sites are having difficulty on EL9
with jobs unexpectedly going over their memory limit, we back ported
this code to 23.9 after the code freeze. Since this has not gone
through our complete testing cycle, this code is disabled by default.
We plan to have it enabled for the 23.10 release.
It you are running EL9 EPs and want to enable this code, set
CGROUP_IGNORE_CACHE_MEMORY to TRUE. If you do so, please let us know
how it goes. We do not expect any problems with this change.
...Tim