[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Change to long GPU UUIDs



On Thu, 2025-04-17 at 14:39:48 +0000, HTCondor Users Mailinglist wrote:
> You will need to restart, because the argument that enables long uuids is an argument to condor_gpu_discovery, and that is only run by the STARTD on startup, not on reconfig.

Thanks, that is what I was afraid of - restarting the STARTD is not an
option at the moment.

> The STARTD also does not track GPUs by the long ids internally, but even if it did, a restart would be needed because of the above reason.
> 
> Our emperical testing shows that the short uuids are sufficiently unique to prevent any confusion on a single machine.  Have you found a machine where that is not true?

They just don't work with certain software... :

> PS. For the curious: it turns out that "jax" supports long UUIDs in
> CUDA_VISIBLE_DEVICES (which we want to provide), or a *single* short
  UUID, but not multiple short UUIDs.

Thanks for the clarification!

Best,
 Steffen

-- 
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~