Hi David, On Mon, 2024-09-09 at 15:55 +0300, David Cohen wrote: > I'm very sorry to hear that, as our users are suffering from both > lack of memory reporting, and jobs unexpectedly going over their > memory limit. > It is to the point they find it hard to get any work done, and I'm > getting all the complaints and user's frustration. > When can we expect the cgroups v2 issues to be fixed? We experienced similar issues after the update to EL9/HTC-23. After following the hint from the KIT colleagues (thanks!) and changing the configuration accordingly, we haven't received complaints any more. You might want to try a similar setting as well (excerpt from our exec node configuration handling memory enforcement): --- # try memory enforments like the guys from KIT # see https://www-auth.cs.wisc.edu/lists/htcondor-users/2024-July/msg00103.shtml CGROUP_MEMORY_LIMIT_POLICY = custom CGROUP_HARD_MEMORY_LIMIT_EXPR = 2 * Target.RequestMemory DISABLE_SWAP_FOR_JOB = true --- We are running EL9+HTC-23.0.14 btw. Cheers, Andreas -- | Andreas Haupt | E-Mail: andreas.haupt@xxxxxxx | DESY, Zeuthen | WWW: http://www.zeuthen.desy.de/~ahaupt | Platanenallee 6 | Phone: +49/33762/7-7359 | D-15738 Zeuthen |
Attachment:
smime.p7s
Description: S/MIME cryptographic signature