Hi,One thing we have noticed, and it seems it hasn't changed, is that the reported memory usage is taken from the classAd attribute. This is highly inaccurate for two reasons:
* It's sampled (by default every five minutes), so this value might be quite old by the time the OoM event arrives. * It uses the old memory tracking system based on RSS that doesn't take into account things like tmpfs (for instance, some of our users use /dev/shm).
This inaccuracy results in at least one bug, because for instance it will consider tmpfs filling up the requested memory as "the system running out of memory". With `IGNORE_LEAF_OOM`' default value of true (still the case in 10.0), it causes jobs to hung waiting eternally for the system to free up memory (when that's not the issue at all).
It also confuses the users, because they sometimes see a reported "peak usage" much lower than the limit, it's not clear to them that there might be something else going on.
So, would it be possible to make it get the value directly from the cgroup, i.e. `memory.max_usage_in_bytes` or `memory.memsw.max_usage_in_bytes`? I'm talking about cgroups v1, I'm not sure how this would affect v2.
Best, Joan On 19/5/23 10:37, Jan van Eldik wrote:
Hallo Marco,Could this be the issue addressed in https://github.com/htcondor/htcondor/commit/3c1b39bf5607d7485aa36e90ab8f6de6f99baeb0Release condor-10.6.0-0.644330.el9.x86_64 includes this, and we have notobserved any cgroups-v2 related crashes on our EL9 servers since we deployed it a few weeks ago.ÂÂ hope this helps, groeten, Jan _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
-- Dr. Joan Josep Piles-Contreras ZWE Scientific Computing Max Planck Institute for Intelligent Systems (p) +49 7071 601 1750
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature