I got curious about the cgroup device controller following Todds
comment on Joan's suggestion to use it for GPUs (I had not noticed
device controllers before) and looked how it works in cgroups v2.
However, the cgroup v2 "Device controller" section does not read very
encouraging without a simple pseudo-file interface but only via ebpf
 https://docs.kernel.org/admin-guide/cgroup-v2.html
But maybe it could be an option for admins to learn a bit more about
ebpf (if a admin could inject their own small ebpf programlets in
addition to the general Condor job control)?
Cheers,
 Thomas
On 16/05/2023 17.51, Todd Tannenbaum via HTCondor-users wrote:
On 5/15/2023 7:20 AM, Joan Josep Piles-Contreras wrote:
Hi,
Some tools directly ignore CUDA_VISIBLE_DEVICES. For instance,
anything not using CUDA but the graphics subsystem of the card, like
headless rendering using EGL.
Also, some libraries / framework override CUDA_VISIBLE_DEVICS by
default, so in our experience it's not as reliable as we'd like to.
What we do is to use a job wrapper that calls a small suid tool that
uses device cgroups [1] to make sure only the assigned GPU(s) can be
accessed.
This can't be bypassed by the user, and it has the added benefit of
"hiding" any other GPUs in the system.
Ideally this could be part of the starterd cgropup setup (when
enabled), I think slurm can do something similar, but in the
meanwhile it's working quite well for us.
Hi Joan,
I like the above idea, thank you for sharing.ÂÂ We have considered
changing the ownership on the gpu /dev files, but I like the idea of
using device cgroups much better. Could you please email me
(off-group is fine) your wrapper/suid tool for reference, and I will
see about incorporating it directly into HTCondor's native cgroup
support. Or if you are interested/willing to make a GitHub pull
request to do the same, that is also welcome :).
Thank you Joan,
regards,
Todd
--
Todd Tannenbaum<tannenba@xxxxxxxxxxx>ÂÂ University of Wisconsin-Madison
Center for High Throughput ComputingÂÂÂ Department of Computer Sciences
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/