Hi,
I have a machine with 1 GPU but we added the -divide 2 -reset 2 options in GPU_DISCOVERY_EXTRA to offer 2 GPUs. This was running fine on 23.0.12 and up to 23.7.2.
# condor_status slot2@xxxxxxxxxxxx -af CondorVersion Gpus DetectedGpus
$CondorVersion: 23.0.12 2024-06-13 BuildID: 739441 PackageID: 23.0.12-1 $ 2 GPU-c659279d, GPU-c659279d
# condor_config_val GPU_DISCOVERY_EXTRA MACHINE_RESOURCE_INVENTORY_GPUs
-repeat 2 -divide 2
/usr/libexec/condor/condor_gpu_discovery Â-properties -repeat 2 -divide 2
However, if we update to 23.8.1 or 23.9.6 this is not working anymore.Â
# condor_status slot2@xxxxxxxxxxxx -af CondorVersion Gpus DetectedGpus
$CondorVersion: 23.8.1 2024-06-27 BuildID: 742100 PackageID: 23.8.1-1 GitSHA: 8cf018d1 $ 1 GPU-c659279d, GPU-c659279d
]# condor_config_val GPU_DISCOVERY_EXTRA MACHINE_RESOURCE_INVENTORY_GPUs
-repeat 2 -divide 2
/usr/libexec/condor/condor_gpu_discovery Â-properties -repeat 2 -divide 2
There are 2 detected gpus but only one is shown by the condor_status command. I am searching for information about the 23.8.1 release, but I could not find any change related to condor_gpu_discovery:
Is this a bug or does something new have to be added in the config for divide/repeat options to work again?
Thank you in advance.
Cheers,
Carles
-- Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10