[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job log not reporting Memory usage

Hi David,

On Mon, 2024-09-09 at 15:55 +0300, David Cohen wrote:
> I'm very sorry to hear that, as our users are suffering from both
> lack of memory reporting, and jobs unexpectedly going over their
> memory limit.
> It is to the point they find it hard to get any work done, and I'm
> getting all the complaints and user's frustration.
> When can we expect the cgroups v2 issues to be fixed?

We experienced similar issues after the update to EL9/HTC-23. After
following the hint from the KIT colleagues (thanks!) and changing the
configuration accordingly, we haven't received complaints any more. You
might want to try a similar setting as well (excerpt from our exec node
configuration handling memory enforcement):

# try memory enforments like the guys from KIT
# see https://www-auth.cs.wisc.edu/lists/htcondor-users/2024-July/msg00103.shtml
CGROUP_HARD_MEMORY_LIMIT_EXPR = 2 * Target.RequestMemory

We are running EL9+HTC-23.0.14 btw.

| Andreas Haupt            | E-Mail: andreas.haupt@xxxxxxx
| DESY, Zeuthen            | WWW:    http://www.zeuthen.desy.de/~ahaupt
| Platanenallee 6          | Phone: +49/33762/7-7359
| D-15738 Zeuthen          |

Attachment: smime.p7s
Description: S/MIME cryptographic signature