Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Job access to own Job and Machine ClassAds?
- Date: Tue, 16 May 2023 10:51:26 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Job access to own Job and Machine ClassAds?
On 5/15/2023 7:20 AM, Joan Josep
Piles-Contreras wrote:
Hi,
Some tools directly ignore CUDA_VISIBLE_DEVICES. For instance,
anything not using CUDA but the graphics subsystem of the card,
like headless rendering using EGL.
Also, some libraries / framework override CUDA_VISIBLE_DEVICS by
default, so in our experience it's not as reliable as we'd like
to.
What we do is to use a job wrapper that calls a small suid tool
that uses device cgroups [1] to make sure only the assigned GPU(s)
can be accessed.
This can't be bypassed by the user, and it has the added benefit
of "hiding" any other GPUs in the system.
Ideally this could be part of the starterd cgropup setup (when
enabled), I think slurm can do something similar, but in the
meanwhile it's working quite well for us.
Hi Joan,
I like the above idea, thank you for sharing. We have considered
changing the ownership on the gpu /dev files, but I like the idea of
using device cgroups much better. Could you please email me
(off-group is fine) your wrapper/suid tool for reference, and I will
see about incorporating it directly into HTCondor's native cgroup
support. Or if you are interested/willing to make a GitHub pull
request to do the same, that is also welcome :).
Thank you Joan,
regards,
Todd
--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing Department of Computer Sciences