Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Change to long GPU UUIDs
- Date: Thu, 17 Apr 2025 17:09:23 +0200
- From: Steffen Grunewald <steffen.grunewald@xxxxxxxxxx>
- Subject: Re: [HTCondor-users] Change to long GPU UUIDs
On Thu, 2025-04-17 at 14:39:48 +0000, HTCondor Users Mailinglist wrote:
> You will need to restart, because the argument that enables long uuids is an argument to condor_gpu_discovery, and that is only run by the STARTD on startup, not on reconfig.
Thanks, that is what I was afraid of - restarting the STARTD is not an
option at the moment.
> The STARTD also does not track GPUs by the long ids internally, but even if it did, a restart would be needed because of the above reason.
>
> Our emperical testing shows that the short uuids are sufficiently unique to prevent any confusion on a single machine. Have you found a machine where that is not true?
They just don't work with certain software... :
> PS. For the curious: it turns out that "jax" supports long UUIDs in
> CUDA_VISIBLE_DEVICES (which we want to provide), or a *single* short
UUID, but not multiple short UUIDs.
Thanks for the clarification!
Best,
Steffen
--
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~