[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor 24 and STARTER_HIDE_GPU_DEVICES



Hi all,

After a while, I came back to look into this issue, but I am completely lost. We are running HTCondor 24.0.11 on AlmaLinux 9 machines. I canât see why STARTER_HIDE_GPU_DEVICES=True is not working for our GPUs, while it works fine in other sites.

In the logs I see:

10/06/25 13:11:13 (pid:1) Successfully moved procid 1 to cgroup /sys/fs/cgroup/system.slice/htcondor/condor_home_execute_slot2_1@xxxxxxxxxxxx/cgroup.procs
10/06/25 13:11:13 (pid:1) cgroup v2 successfully installed bpf program to limit access to devices
10/06/25 13:11:13 (pid:10613) Create_Process succeeded, pid=10615

But if I try to run nvidia-smi inside an HTcondor job (or any other GPU execution):

No devices were found

On the other hand, the CUDA_VISIBLE_DEVICES variable is correct.ÂÂ

It feels like this is related to our cgroups v2 setup or the GPU driver, but after checking many things, I donât know where else to look. Do you have any suggestions?

Best regards,

Carles

On Tue, 8 Jul 2025 at 06:02, Carles Acosta <cacosta@xxxxxx> wrote:
Hi Greg,

Thank you very much. I will try with STARTER_HIDE_GPU_DEVICES set to false then.

Cheers,

Carles

On Mon, 7 Jul 2025 at 16:07, Greg Thain <gthain@xxxxxxxxxxx> wrote:
On 7/7/25 05:40, Carles Acosta wrote:
> Dear all,
>
> Nobody else is experiencing issues with the STARTER_HIDE_GPU_DEVICES
> variable set to True in HTCondor 24.0.X, unlike me. Is that correct?
> Would it be safe to set STARTER_HIDE_GPU_DEVICES = False? The GPUs
> machines are the only ones we still have on 23.0.XX version due to
> this issue.


Setting STARTER_HIDE_GPU_DEVICES to false is safe, and we have no other
reports of problem with this knob.

-greg




--
Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10
AvÃs - Aviso - Legal Notice: Âhttp://legal.ifae.es


--
Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10
http://www.pic.esÂ
AvÃs - Aviso - Legal Notice: Âhttp://legal.ifae.es