Hi,
jep, that sounds like a possible issue, easiest thing would be to do a 'su condor' and execute it from there to check ? 
All the rest is looking as expected I am afraid ... 
Best
christoph
-- 
Christoph Beyer
DESY Hamburg
IT-Department
Notkestr. 85
Building 02b, Room 009
22607 Hamburg
phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx
Von: "Martin Sajdl" <masaj.xxx@xxxxxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>, "Josef MitlÃhner" <josef.mitlohner@xxxxxx>
Gesendet: Freitag, 3. April 2020 12:36:36
Betreff: Re: [HTCondor-users] Detecting GPU
Hi guys,
      it seems the issue is that condor_gpu_discovery utility works a
      bit different when it is launched from normal user session in
      Windows or from a context of running service (condor
      deamon)...
      As Josef wrote, it seems it has a limited access to GPU from the
      context of the service, but it is still somehow linked to GPU type
      (this one is very old), because the limitation seems not to be
      there on system with newer GPUs.
      
      Masaj
    
    On 03.04.2020 11:59, Josef MitlÃhner
      wrote:
    
    
      
      C:\>condor_config_val -dump | grep -i gpu
        ENVIRONMENT_FOR_AssignedGPUs = GPU_DEVICE_ORDINAL=/(CUDA|OCL)// 
        CUDA_VISIBLE_DEVICES
        ENVIRONMENT_VALUE_FOR_UnAssignedGPUs = 10000
        MACHINE_RESOURCE_INVENTORY_GPUs =
        $(LIBEXEC)/condor_gpu_discovery -properties
        $(GPU_DISCOVERY_EXTRA)
        STARTD_CRON_GPUs_MONITOR_EXECUTABLE =
        $(LIBEXEC)/condor_gpu_utilization
        STARTD_CRON_GPUs_MONITOR_METRICS = SUM:GPUs, PEAK:GPUsMemory
        STARTD_CRON_GPUs_MONITOR_MODE = WaitForExit
        STARTD_CRON_GPUs_MONITOR_PERIOD = 1
        STARTD_CRON_JOBLIST =  GPUs_MONITOR GPUs_MONITOR STARTCFG
        
        Best regards
        Josef
      
      On 3.4.2020 11:41, Beyer, Christoph
        wrote:
      
      
        
        
          there sems to be a little something missing somewhere ;)
            
          
          
          
          I had similar problems when we started to use GPUs, the
            cause was an individual configuration overwriting the
            feature config. 
          
          
          
          What does condor_config_val say, it should look somehow
            similar to this: 
          
          
          
          
            [root@batchg003 ~]# condor_config_val -dump | grep -i
              gpu
              ENVIRONMENT_FOR_AssignedGPUs =
              GPU_DEVICE_ORDINAL=/(CUDA|OCL)//  CUDA_VISIBLE_DEVICES
              ENVIRONMENT_VALUE_FOR_UnAssignedGPUs = 10000
              MACHINE_RESOURCE_INVENTORY_GPUs =
              $(LIBEXEC)/condor_gpu_discovery -properties
              $(GPU_DISCOVERY_EXTRA)
              SLOT_TYPE_1 = GPUs=1, CPUs=2
              SLOT_WEIGHT = GPUs
              START = (NODE_IS_HEALTHY =?= True) && (StartJobs
              =?= True) && TARGET.RequestGpus &&
              (RequestRuntime <= 12000)
              STARTD_CRON_GPUs_MONITOR_EXECUTABLE =
              $(LIBEXEC)/condor_gpu_utilization
              STARTD_CRON_GPUs_MONITOR_METRICS = SUM:GPUs,
              PEAK:GPUsMemory
              STARTD_CRON_GPUs_MONITOR_MODE = WaitForExit
              STARTD_CRON_GPUs_MONITOR_PERIOD = 1
              STARTD_CRON_JOBLIST = NODEHEALTH GPUs_MONITOR GPUs_MONITOR
              
            
            Best
            
            Christoph
            
            
            
           
          
          
          
            -- 
            Christoph Beyer
            DESY Hamburg
            IT-Department
            
            Notkestr. 85
            Building 02b, Room 009
            22607 Hamburg
            
            phone:+49-(0)40-8998-2317
            mail: 
christoph.beyer@xxxxxxx
          
          
          
          
          
          
          Hello
              everyone,
              I made a task where there was only "condor_gpu_discovery
              -extra" and the output was only "DetectedGPUs = 0".
              However, when I execute the command manually, it returns:
              
               C: \> condor_gpu_discovery -extra
              DetectedGPUs = "CUDA1"
              CUDACapability = 1.2
              CUDAClockMhz = 1402.00
              CUDAComputeUnits = 2
              CUDACoresPerCU = 8
              CUDADeviceName = "GeForce 210"
              CUDADevicePciBusId = "0000: 05: 00.0"
              CUDADeviceUuid = "00000000-0000-0000-0000-000000000000"
              CUDADriverVersion = 6.50
              CUDAECCEnabled = false
              CUDAGlobalMemoryMb = 1024
              CUDARuntimeVersion = 10.20
              
              So in the configuration context, condor_gpu_discovery does
              not have access to any GPU information.
              
              Best regards
              Josef
            
            On 2.4.2020 13:34, Josef
              MitlÃhner wrote:
            
             Hi,
              
              lspci | grep -i nvidia
              05:00.0 VGA compatible controller: NVIDIA Corporation
              GT218 [GeForce 210] (rev a2)
              
              C:\>condor_status -l mitlohner-w764 | grep -i gpu
              DetectedGPUs = 0
              GPUs = 0
              MachineResources = "Cpus Memory Disk Swap GPUs"
              TotalGPUs = 0
              TotalSlotGPUs = 0
              
              Best regards
              Josef
              
              On 2.4.2020 12:45, Beyer,
                Christoph wrote:
              
              
                
                  hmm,
                  
                  
                  
                  what does 
                  
                  
                  
                  lspci | grep -i nvidia
                  
                  
                  
                  say ? 
                  
                  
                  
                  condor_Status should look somehow like this: 
                  
                  
                  
                  [root@batchg003 ~]# condor_status -l batchg003 |
                    grep -i gpu
                    AssignedGPUs = "CUDA0"
                    DetectedGPUs = 1
                    GPUs = 1
                    MachineResources = "Cpus Memory Disk Swap GPUs"
                    SlotWeight = GPUs
                    Start = (NODE_IS_HEALTHY =?= true) &&
                    (StartJobs =?= true) && TARGET.RequestGpus
                    && (RequestRuntime <= 12000)
                    TotalGPUs = 1
                    TotalSlotGPUs = 1
                    [root@batchg003 ~]# condor_status -l batchg003 |
                    grep -i cuda
                    AssignedGPUs = "CUDA0"
                    CUDACapability = 6.1
                    CUDADeviceName = "GeForce GTX 1080 Ti"
                    CUDADevicePciBusId = "0000:65:00.0"
                    CUDADeviceUuid =
                    "3f2d719f-7d89-c75c-1a71-94316a2fcd12"
                    CUDADriverVersion = 10.2
                    CUDAECCEnabled = false
                    CUDAGlobalMemoryMb = 11178
                    
                  
                  Best
                  
                  Christoph
                  
                  
                  
                  
                    -- 
                    Christoph Beyer
                    DESY Hamburg
                    IT-Department
                    
                    Notkestr. 85
                    Building 02b, Room 009
                    22607 Hamburg
                    
                    phone:+49-(0)40-8998-2317
                    mail: 
christoph.beyer@xxxxxxx
                  
                  
                  
                  
                  
                  
                  Hi,
                      thank you for your reply.
                      
                      The result is the same. The only change is (after
                      installing CUDA pagkage) in the
                      "condor_gpu_disovery -properties" listing:
                      
                      DetectedGPUs="CUDA0"
                      CUDACapability=1.2
                      CUDADeviceName="GeForce 210"
                      CUDADevicePciBusId="0000:05:00.0"
CUDADeviceUuid="00000000-0000-0000-0000-000000000000"
                      CUDADriverVersion=6.50
                      CUDAECCEnabled=false
                      CUDAGlobalMemoryMb=1024
                      CUDARuntimeVersion=10.20
                      
                      Thanks for help,
                      Best regards
                      Josef
                    
                    On 2.4.2020 10:24,
                      Beyer, Christoph wrote:
                    
                    
                      
                        Hi,
                        
                        
                        
                        try 
                        
                        @use feature : GPUs
                        
                        @use feature : GPUsMonitor
                        
                        
                        
                        The second one is not mandatory of course
                          but you will want it ;) 
                        
                        
                        install the cuda and nvidia-driver pkgs (I
                          think those cone with the cuda pkg though) 
                        
                        
                        
                        cuda.x86_64
                        
                        
                        
                        Restart the host and check ... 
                        
                        
                        
                        Best
                        
                        christoph
                        
                        
                        
                          -- 
                          Christoph Beyer
                          DESY Hamburg
                          IT-Department
                          
                          Notkestr. 85
                          Building 02b, Room 009
                          22607 Hamburg
                          
                          phone:+49-(0)40-8998-2317
                          mail: 
christoph.beyer@xxxxxxx
                        
                        
                        
                        
                        
                        
                        Hello,
                           when I run
                            the command "condor_gpu_discovery
                            -properties" on my computer it detects one
                            GPU
                            
                            DetectedGPUs="CUDA0"
                            can't open SOFTWARE\NVIDIA Corporation\GPU
                            Computing Toolkit\CUDA
                            CUDACapability=1.2
                            CUDADeviceName="GeForce 210"
                            CUDADevicePciBusId="0000:05:00.0"
CUDADeviceUuid="00000000-0000-0000-0000-000000000000"
                            CUDADriverVersion=6.50
                            CUDAECCEnabled=false
                            CUDAGlobalMemoryMb=1024
                            
                            In condor.config i have a line with "use
                            feature : GPUs"
                            
                            
                            Why does my HTCondor server say
                            (condor_status -l):
                            ...
                            DetectedGPUs = 0
                            ...
                            
                            ?
                            Thank you for reply
                            Josef
                            
                          
                          
_______________________________________________
                          HTCondor-users mailing list
                          To unsubscribe, send a message to 
htcondor-users-request@xxxxxxxxxxx
                          with a
                          subject: Unsubscribe
                          You can also unsubscribe by visiting
                          
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
                          
                          The archives can be found at:
                          
https://lists.cs.wisc.edu/archive/htcondor-users/
                         
                      
                      
                      _______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
                    
                    
                    
                    _______________________________________________
                    HTCondor-users mailing list
                    To unsubscribe, send a message to 
htcondor-users-request@xxxxxxxxxxx
                    with a
                    subject: Unsubscribe
                    You can also unsubscribe by visiting
                    
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
                    
                    The archives can be found at:
                    
https://lists.cs.wisc.edu/archive/htcondor-users/
                   
                 
                
                
                _______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
              
              
              
              
              _______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
            
            
            
            _______________________________________________
            HTCondor-users mailing list
            To unsubscribe, send a message to 
htcondor-users-request@xxxxxxxxxxx
            with a
            subject: Unsubscribe
            You can also unsubscribe by visiting
            
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
            
            The archives can be found at:
            
https://lists.cs.wisc.edu/archive/htcondor-users/
           
         
        
        
        _______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
      
      
      
      
      _______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
    
    
  
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/