[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] repeat/divide GPU options not working on condor 23.8.1//23.9.6?



Hi,

I have a machine with 1 GPU but we added the -divide 2 -reset 2 options in GPU_DISCOVERY_EXTRA to offer 2 GPUs. This was running fine on 23.0.12 and up to 23.7.2.

# condor_status slot2@xxxxxxxxxxxx -af CondorVersion Gpus DetectedGpus
$CondorVersion: 23.0.12 2024-06-13 BuildID: 739441 PackageID: 23.0.12-1 $ 2 GPU-c659279d, GPU-c659279d
# condor_config_val GPU_DISCOVERY_EXTRA MACHINE_RESOURCE_INVENTORY_GPUs
-repeat 2 -divide 2
/usr/libexec/condor/condor_gpu_discovery Â-properties -repeat 2 -divide 2

However, if we update to 23.8.1 or 23.9.6 this is not working anymore.Â

# condor_status slot2@xxxxxxxxxxxx -af CondorVersion Gpus DetectedGpus
$CondorVersion: 23.8.1 2024-06-27 BuildID: 742100 PackageID: 23.8.1-1 GitSHA: 8cf018d1 $ 1 GPU-c659279d, GPU-c659279d
]# condor_config_val GPU_DISCOVERY_EXTRA MACHINE_RESOURCE_INVENTORY_GPUs
-repeat 2 -divide 2
/usr/libexec/condor/condor_gpu_discovery Â-properties -repeat 2 -divide 2

There are 2 detected gpus but only one is shown by the condor_status command. I am searching for information about the 23.8.1 release, but I could not find any change related to condor_gpu_discovery:

https://htcondor.readthedocs.io/en/latest/version-history/feature-versions-23-x.html#version-23-8-1

Is this a bug or does something new have to be added in the config for divide/repeat options to work again?

Thank you in advance.

Cheers,

Carles

--
Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10
AvÃs - Aviso - Legal Notice: Âhttp://legal.ifae.es