condor_gpu_discovery does not access any registry keys. it just loads cudart.dll or nvcuda.dll and calls functions from those DLLs. If the registry is being accessed, it is by those DLLs, so you would need to refer to the documentation from NVIDA about registry keys. What does condor_gpu_discovery report as your driver version and runtime version? The value that condor_gpu_discovery reports comes from these dlls. If the value is wrong, it is because your version of the CUDA libraries is incompatible with programs built with older versions of their SDK. We are looking in to what it would take to make
our gpu_discovery work with CUDA 10 without breaking backward compatibility, but so far we do not have a solution for this problem.
As for the gpu_monitor, this tool is currently only being built on LINUX, the documentation needs to be updated to reflect that. -tj From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of Michael Pelletier I think ran into this same problem with CUDA 10.0 in a recent 8.6 release, and I think it had something to do with a change to the interface between the 9.x and 10.0 CUDA libraries. It was giving some really
off-the-wall numbers to the collector. I believe there’s a ticket open for the issue as a result of my inquiry to support. In the meantime, you can also install the 9.2 release, and then set up the library path for condor_gpu_discovery to refer to /usr/local/cuda-9.2 instead of /usr/local/cuda, and that should get you through until
they come out with an update for CUDA 10 support. Michael V. Pelletier From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of Jens Schmaler Dear all, we are using HTCondor 8.8 on Windows (Win 10 and Win 2016 specifically) with CUDA 10.0 installed. Some systems do have large GPUs, e.g. with 12 GB or even 32 GB of
memory. Nevertheless, condor_gpu_discovery will only show a maximum of Besides that, I discovered that condor_gpu_discovery tries to access the registry key
"SOFTWARE\\NVIDIA Corporation\\GPU Computing Toolkit\\CUDA" which does not seem to exist on any of our systems. Could you please tell me under which circumstances you would expect this key to exist? Thanks a lot, Jens |