Hi tj, On 4/10/20 12:41 AM, John M Knoeller wrote: > 04/09/20 07:05:14 ERROR "Failed to bind local resource 'GPUs'" at line 1272 .. > > There was a known bug in this code when there were multiple GPUS that had the same device name. > (i.e. the device list was CUDA0,CUDA0) Is that the case here? nope, this box only has a single (old) GPU in it: condor_status -l slot1@xxxxxxxxxxxxxxxxx |awk 'tolower($1)~/gpu/ {print}' AssignedGPUs = "CUDA0" ChildGPUs = { 0,0,0,0 } DetectedGPUs = 1 GPUs = 1 TotalGPUs = 1 TotalSlotGPUs = 1 nvidia-smi -L GPU 0: GeForce GT 640 (UUID: GPU-27ce3be5-06de-e8b2-419e-6edc9e05b2c7) But maybe, the startd thinks it has an invisible second one as some strings seems to be incomplete in its logs: StartLog:04/10/20 02:21:07 unbind_DevIds for slot1.3 before : GPUs:{CUDA0, }{1_5, } Cheers Carsten -- Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics, CallinstraÃe 38, 30167 Hannover, Germany Phone: +49 511 762 17185
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature