Hi Cole, sorry for the late reply. On 5/7/26 18:20, Cole Bollig wrote:
As for potential solutions, I think the only way is script that is executed via configuration to set referable configuration macros with the exact GPU counts. Something like:
Yeah, I figured as much :).As a GPU may break after many days of operation, I think I'll query the state via a timer and update a config snippet via an external script and run condor_reconfig. That way, I should be able to ensure that new jobs will only arrive on still working GPUs.
Another quick kind of related question:So far, we used SLOT<N>_CPU_AFFINITY to enforce NUMA boundaries between partitionable slots (at least that's what I think this is doing). We do that as all our processing is pretty sensitive to bandwidth and we do not want to move vast amounts of data between CPUs.
However, I just saw "This configuration variable is replaced by ASSIGN_CPU_AFFINITY. Do not enable this configuration variable unless using glidein or another unusual setup.
"but as ASSIGN_CPU_AFFINITY is a simple boolean, how can I ensure that certain CPU cores are only used by a specific slot?
Cheers and thanks a lot in advance. Carsten -- Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics, CallinstraÃe 38, 30167 Hannover, Germany, Phone +49 511 762 17185
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature