Hi,
I have a machine with 1 GPU but we added the -divide 2
-reset 2 options in GPU_DISCOVERY_EXTRA to offer 2
GPUs. This was running fine on 23.0.12 and up to
23.7.2.
# condor_status slot2@xxxxxxxxxxxx
-af CondorVersion Gpus DetectedGpus
$CondorVersion: 23.0.12 2024-06-13 BuildID: 739441
PackageID: 23.0.12-1 $ 2 GPU-c659279d,
GPU-c659279d
# condor_config_val GPU_DISCOVERY_EXTRA
MACHINE_RESOURCE_INVENTORY_GPUs
-repeat 2 -divide 2
/usr/libexec/condor/condor_gpu_discovery
-properties -repeat 2 -divide 2
However, if we update to 23.8.1 or 23.9.6 this is
not working anymore.
# condor_status slot2@xxxxxxxxxxxx
-af CondorVersion Gpus DetectedGpus
$CondorVersion: 23.8.1 2024-06-27 BuildID: 742100
PackageID: 23.8.1-1 GitSHA: 8cf018d1 $ 1
GPU-c659279d, GPU-c659279d
]# condor_config_val GPU_DISCOVERY_EXTRA
MACHINE_RESOURCE_INVENTORY_GPUs
-repeat 2 -divide 2
/usr/libexec/condor/condor_gpu_discovery
-properties -repeat 2 -divide 2
There are 2 detected gpus but only one is shown by
the condor_status command. I am searching for
information about the 23.8.1 release, but I could not
find any change related to condor_gpu_discovery:
Is this a bug or does something new have to be
added in the config for divide/repeat options to work
again?
Thank you in advance.
Cheers,
Carles
--
Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10