Hi tj,
On 4/10/20 12:41 AM, John M Knoeller wrote:
> 04/09/20 07:05:14 ERROR "Failed to bind local resource 'GPUs'" at line 1272 ..
> 
> There was a known bug in this code when there were multiple GPUS that had the same device name.  
> (i.e. the device list was  CUDA0,CUDA0)  Is that the case here?
nope, this box only has a single (old) GPU in it:
condor_status -l slot1@xxxxxxxxxxxxxxxxx |awk 'tolower($1)~/gpu/ {print}'
AssignedGPUs = "CUDA0"
ChildGPUs = { 0,0,0,0 }
DetectedGPUs = 1
GPUs = 1
TotalGPUs = 1
TotalSlotGPUs = 1
nvidia-smi -L
GPU 0: GeForce GT 640 (UUID: GPU-27ce3be5-06de-e8b2-419e-6edc9e05b2c7)
But maybe, the startd thinks it has an invisible second one as some
strings seems to be incomplete in its logs:
StartLog:04/10/20 02:21:07 unbind_DevIds for slot1.3 before :
GPUs:{CUDA0, }{1_5, }
Cheers
Carsten
-- 
Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics,
CallinstraÃe 38, 30167 Hannover, Germany
Phone: +49 511 762 17185
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature