[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] HTCondor jobs & cgroups



Hi, we are running htcondor  ($CondorVersion: 23.0.10 2024-05-09 BuildID: 731952 PackageID: 23.0.10-1 $) on Rocky Linux el9.
We have noticed that some jobs appear to be escaping their cgroup. We expect the jobs to run within the “htcondor” cgroup tree, but instead we see some jobs running under the condor.service cgroup tree.
This is causing accounting of job resources to fail when the job is running under the condor.service cgroup tree.

We are using the default settings with respect to the systemd unit file (i.e. we have “Delegate=yes” set).

In our configuration for startd we have set: CGROUP_MEMORY_LIMIT_POLICY = hard.

As far as we can tell there doesn’t seem to be much pattern for jobs ending up in the condor.service cgroup, it appears to be fairly “random” but there may be something that we haven’t spotted yet.

Has anyone else seen similar behaviour or got any suggestions for how we might troubleshoot this?

 

Thanks,

 

Dan Whitehouse