Hi all,
After a while, I came back to look into this issue, but I am completely lost. We are running HTCondor 24.0.11 on AlmaLinux 9 machines. I canât see why STARTER_HIDE_GPU_DEVICES=True is not working for our GPUs, while it works fine in other sites.
In the logs I see:
10/06/25 13:11:13 (pid:1) Successfully moved procid 1 to cgroup /sys/fs/cgroup/system.slice/htcondor/condor_home_execute_slot2_1@xxxxxxxxxxxx/cgroup.procs
10/06/25 13:11:13 (pid:1) cgroup v2 successfully installed bpf program to limit access to devices
10/06/25 13:11:13 (pid:10613) Create_Process succeeded, pid=10615
But if I try to run nvidia-smi inside an HTcondor job (or any other GPU execution):
What are the names of your GPU devices? Do the names begin with "GPU-"? Are you using MIG?
-greg