[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor 24 and STARTER_HIDE_GPU_DEVICES



Hi Greg,

All the GPU device names begin with GPU-*. We are not using MIG.Â

Thank you.

Cheers,

Carles

On Mon, 6 Oct 2025 at 22:36, Greg Thain <gthain@xxxxxxxxxxx> wrote:


On 10/6/25 7:55 AM, Carles Acosta wrote:
Hi all,

After a while, I came back to look into this issue, but I am completely lost. We are running HTCondor 24.0.11 on AlmaLinux 9 machines. I canât see why STARTER_HIDE_GPU_DEVICES=True is not working for our GPUs, while it works fine in other sites.

In the logs I see:

10/06/25 13:11:13 (pid:1) Successfully moved procid 1 to cgroup /sys/fs/cgroup/system.slice/htcondor/condor_home_execute_slot2_1@xxxxxxxxxxxx/cgroup.procs
10/06/25 13:11:13 (pid:1) cgroup v2 successfully installed bpf program to limit access to devices
10/06/25 13:11:13 (pid:10613) Create_Process succeeded, pid=10615

But if I try to run nvidia-smi inside an HTcondor job (or any other GPU execution):


What are the names of your GPU devices? Do the names begin with "GPU-"? Are you using MIG?

-greg



--
Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10
AvÃs - Aviso - Legal Notice: Âhttp://legal.ifae.es