[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor jobs & cgroups



On 5/31/24 10:29, Whitehouse, Dan wrote:

Hi Greg,

Thanks very much for your prompt reply.

Could you advise whether this fix is likely to be backported to 23.0.12?

We are a little reticent about deploying feature releases but we arenât going to rule that out and I may well end up taking your advice.  However if the fix is likely to be ported to the next LTS we may prefer to wait.


Hi Dan:

I appreciate your caution with respect to upgrades.  I'm not against backporting this, but I think I'd like to see it get more time being used in the field in real world usage before backporting to stable, so I can't say when it might go in.


-greg


 

Thanks,

 

Dan

 

On 5/31/24 07:31, Whitehouse, Dan wrote:

Hi, we are running htcondor  ($CondorVersion: 23.0.10 2024-05-09 BuildID: 731952 PackageID: 23.0.10-1 $) on Rocky Linux el9.
We have noticed that some jobs appear to be escaping their cgroup. We expect the jobs to run within the âhtcondorâ cgroup tree, but instead we see some jobs running under the condor.service cgroup tree.

 

Hi Dan:

We've recently fixed some race conditions with cgroup v2 systems.  This code is in 23.7 -- would it be possible for you to try with this version?  Also note that in 23.8, in order to properly comply with the systemd "one writer" rule, the cgroup v2 htcondor tree will be rooted under the condor.service tree created by systemd for the daemons.

-greg