You will need to restart, because the argument that enables long uuids is an argument to condor_gpu_discovery, and that is only run by the STARTD on startup, not on reconfig.
The STARTD also does not track GPUs by the long ids internally, but even if it did, a restart would be needed because of the above reason.
Our emperical testing shows that the short uuids are sufficiently unique to prevent any confusion on a single machine. Have you found a machine where that is not true?
-tj
From: HTCondor-users on behalf of Steffen Grunewald
Sent: Thursday, April 17, 2025 7:30 AM To: HTCondor Users Mailinglist Subject: [HTCondor-users] Change to long GPU UUIDs Good afternoon,
for some very specific reason, we need to change our GPU machine setup to report long, not shortened, GPU UUIDs (condor_gpu_discovery -uuid). Currently the STARTD only knows about the short UUIDs, and several dynamic slots have been created with corresponding "AssignedGPUs" set. May I safely assume that GPU resources are indexed internally *not* by their (short or long) UUID, so I could just do a "condor_reconfigure" to switch to long UUIDs - or would this (a) request something more drastic (e.g., condor_restart -startd) or (b) make the partitionable slot lose track of resources already scheduled? In short, should I better wait for the machine to become idle? Thanks, Steffen PS. For the curious: it turns out that "jax" supports long UUIDs in CUDA_VISIBLE_DEVICES, and a single short UUID, but not multiple short UUIDs. This isn't documented anywhere, and nobody seems to run jax code in an HTCOndor context. -- Steffen Grunewald, Cluster Administrator Max Planck Institute for Gravitational Physics (Albert Einstein Institute) Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany ~~~ Fon: +49-331-567 7274 Mail: steffen.grunewald(at)aei.mpg.de ~~~ _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe Join us in June at Throughput Computing 25: https://urldefense.com/v3/__https://osg-htc.org/htc25__;!!Mak6IKo!OyZ37OWJEuS9UZONIgwMJX5mNgJkJQHT9Wfnk0UaTBG3AtpbIH6dx0ulrhTFUrHUZas9y2d9a69gSwyyM2T3dxrO57C2dLKR$ The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/ |