Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] job cannot access gpu - "cgroup v2 could not attach gpu device limiter to cgroup: Operation not permitted"
- Date: Wed, 29 Oct 2025 10:00:31 -0500
- From: Greg Thain <gthain@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] job cannot access gpu - "cgroup v2 could not attach gpu device limiter to cgroup: Operation not permitted"
On 10/29/25 01:10, Carles Acosta wrote:
Hi Alec,
We found a similar issue, although it doesnât seem to be exactly the
same as yours. In our case, it was caused by having the -not-nested
option in GPU_DISCOVERY_EXTRA and STARTER_HIDE_GPU_DEVICES set to
True. When we removed the -not-nested option, everything worked correctly.
Do you have something similar in your configuration? If you set
STARTER_HIDE_GPU_DEVICES to False, do your jobs run and detect the GPU
properly?
In addition to what Carles said, htcondor is designed to give each job a
new cgroup, even if the previous job in that slot would have had the
same constraints, so I'm interested to hear if STARTER_HIDE_GPU_DEVICES
= false fixes the immediate problem.
-greg