[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] job cannot access gpu - "cgroup v2 could not attach gpu device limiter to cgroup: Operation not permitted"



On 10/29/25 01:10, Carles Acosta wrote:
Hi Alec,

We found a similar issue, although it doesnât seem to be exactly the same as yours. In our case, it was caused by having the -not-nested option in GPU_DISCOVERY_EXTRA and STARTER_HIDE_GPU_DEVICES set to True. When we removed the -not-nested option, everything worked correctly.

Do you have something similar in your configuration? If you set STARTER_HIDE_GPU_DEVICES to False, do your jobs run and detect the GPU properly?

In addition to what Carles said, htcondor is designed to give each job a new cgroup, even if the previous job in that slot would have had the same constraints, so I'm interested to hear if STARTER_HIDE_GPU_DEVICES = false fixes the immediate problem.

-greg