Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Change to long GPU UUIDs
- Date: Thu, 17 Apr 2025 14:30:39 +0200
- From: Steffen Grunewald <steffen.grunewald@xxxxxxxxxx>
- Subject: [HTCondor-users] Change to long GPU UUIDs
Good afternoon,
for some very specific reason, we need to change our GPU machine setup
to report long, not shortened, GPU UUIDs (condor_gpu_discovery -uuid).
Currently the STARTD only knows about the short UUIDs, and several
dynamic slots have been created with corresponding "AssignedGPUs" set.
May I safely assume that GPU resources are indexed internally *not* by
their (short or long) UUID, so I could just do a "condor_reconfigure"
to switch to long UUIDs - or would this (a) request something more
drastic (e.g., condor_restart -startd) or (b) make the partitionable
slot lose track of resources already scheduled?
In short, should I better wait for the machine to become idle?
Thanks,
Steffen
PS. For the curious: it turns out that "jax" supports long UUIDs in
CUDA_VISIBLE_DEVICES, and a single short UUID, but not multiple
short UUIDs. This isn't documented anywhere, and nobody seems to run
jax code in an HTCOndor context.
--
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~