the condor_gpu_discovery binary is completely portable, so could you try copying it from a machine that has 8.8.15 installed to one of the machines that is not detecting GPUs and running it there interactively?
This will help us to know if this is really a problem with the condor_gpu_discovery binary, or something else
thanks
-tj
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Carles Acosta <cacosta@xxxxxx>
Sent: Tuesday, September 28, 2021 3:20 AM To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx> Subject: [HTCondor-users] GPUs not detected in 9.0.6 version Dear all,
We have recently migrated from HTCondor 8.8.15 to 9.0.6 all our pool (keeping, for now, our old PASSWORD security configuration).
Everything is working fine with the exception of two machines that have GeForce GTX 1050 Ti GPUs. We have realized that the GPU is not detected using HTCondor 9.0.6, while it is detected again with version 9.0.5.
# condor_status slot2@xxxxxxxxxxxx -af Gpus DetectedGpus CondorVersion
0 0 $CondorVersion: 9.0.6 Sep 23 2021 BuildID: 557184 PackageID: 9.0.6-1 $1 GPU-c659279d $CondorVersion: 9.0.5 Aug 18 2021 BuildID: 554415 PackageID: 9.0.5-1 $ # condor_status slot2@xxxxxxxxxxxx -af Gpus DetectedGpus CondorVersion We have other GPUs machines (GeForce RTX 2080 Ti or Tesla V100) that are correctly detected with 9.0.6 version, it seems that it just affects these older gpus.
Do you know what is happening? Please let me know if you need further information.
Cheers,
Carles
Carles Acosta i Silva
PIC (Port d'Informació Científica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10
Avís - Aviso - Legal Notice: http://legal.ifae.es
|