On 10/29/25 01:10, Carles Acosta wrote:
> Hi Alec,
>
> We found a similar issue, although it doesnât seem to be exactly the
> same as yours. In our case, it was caused by having the -not-nested
> option in GPU_DISCOVERY_EXTRA and STARTER_HIDE_GPU_DEVICES set to
> True. When we removed the -not-nested option, everything worked correctly.
>
> Do you have something similar in your configuration? If you set
> STARTER_HIDE_GPU_DEVICES to False, do your jobs run and detect the GPU
> properly?
>
In addition to what Carles said, htcondor is designed to give each job a
new cgroup, even if the previous job in that slot would have had the
same constraints, so I'm interested to hear if STARTER_HIDE_GPU_DEVICES
= false fixes the immediate problem.
-greg
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/