[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Memory enforcement with cgroups v2 on EL9



Hi Tim,

I tested HTCondor version 23.9.6 with CGROUP_IGNORE_CACHE_MEMORY = TRUE on some RHEL 8, yes 8, not 9, WNs with cgroup v2. I am not that familiar with memory management in Linux, but I try to share my experience and opinions.

The memory usage shown in HTCondor is now lower and does not include memory cache. However, it confuses me since the memory consumption without inactive_file and inactive_anon from the memory.stat file is lower than RES (resident size) in htop. The resident size is the memory.current value minus the file cache (inactive_file from the memory.stat file). That should be similar to what was used in memory usage with cgroup v1.

E.g. one job had:

memory.current 2033MB
inactive_anon 1809MB
inactive_fileÂÂÂÂ 172MB

htop (RES) ~ 1800MB

ÂI played a bit with some memory limits. Programs work with a bit more memory than memory.current (at peak) -inactive_file - inactive_anon. The memory usage with CGROUP_IGNORE_CACHE_MEMORY = TRUE is valid but unusual. I think it would be useful to have the memory usage from RES/memory.current - inactive_file, since that is what most users are familiar and can compare with local runs via htop or similar programs.
Sorry for asking about an additional option :-)

Best regards,

Matthias


On 8/8/24 10:09 PM, Tim Theisen via HTCondor-users wrote:
The recent HTCondor 23.9.6 release has code to not count kernel cache memory in the jobs memory usage. This code was originally slated for the 23.10 release. However, since sites are having difficulty on EL9 with jobs unexpectedly going over their memory limit, we back ported this code to 23.9 after the code freeze. Since this has not gone through our complete testing cycle, this code is disabled by default. We plan to have it enabled for the 23.10 release.

It you are running EL9 EPs and want to enable this code, set CGROUP_IGNORE_CACHE_MEMORY to TRUE. If you do so, please let us know how it goes. We do not expect any problems with this change.

...Tim