Hi,
jep, that sounds like a possible issue, easiest thing would be to do a 'su condor' and execute it from there to check ?
All the rest is looking as expected I am afraid ...
Best
christoph
--
Christoph Beyer
DESY Hamburg
IT-Department
Notkestr. 85
Building 02b, Room 009
22607 Hamburg
phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx
Von: "Martin Sajdl" <masaj.xxx@xxxxxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>, "Josef MitlÃhner" <josef.mitlohner@xxxxxx>
Gesendet: Freitag, 3. April 2020 12:36:36
Betreff: Re: [HTCondor-users] Detecting GPU
Hi guys,
it seems the issue is that condor_gpu_discovery utility works a
bit different when it is launched from normal user session in
Windows or from a context of running service (condor
deamon)...
As Josef wrote, it seems it has a limited access to GPU from the
context of the service, but it is still somehow linked to GPU type
(this one is very old), because the limitation seems not to be
there on system with newer GPUs.
Masaj
On 03.04.2020 11:59, Josef MitlÃhner
wrote:
C:\>condor_config_val -dump | grep -i gpu
ENVIRONMENT_FOR_AssignedGPUs = GPU_DEVICE_ORDINAL=/(CUDA|OCL)//
CUDA_VISIBLE_DEVICES
ENVIRONMENT_VALUE_FOR_UnAssignedGPUs = 10000
MACHINE_RESOURCE_INVENTORY_GPUs =
$(LIBEXEC)/condor_gpu_discovery -properties
$(GPU_DISCOVERY_EXTRA)
STARTD_CRON_GPUs_MONITOR_EXECUTABLE =
$(LIBEXEC)/condor_gpu_utilization
STARTD_CRON_GPUs_MONITOR_METRICS = SUM:GPUs, PEAK:GPUsMemory
STARTD_CRON_GPUs_MONITOR_MODE = WaitForExit
STARTD_CRON_GPUs_MONITOR_PERIOD = 1
STARTD_CRON_JOBLIST = GPUs_MONITOR GPUs_MONITOR STARTCFG
Best regards
Josef
On 3.4.2020 11:41, Beyer, Christoph
wrote:
there sems to be a little something missing somewhere ;)
I had similar problems when we started to use GPUs, the
cause was an individual configuration overwriting the
feature config.
What does condor_config_val say, it should look somehow
similar to this:
[root@batchg003 ~]# condor_config_val -dump | grep -i
gpu
ENVIRONMENT_FOR_AssignedGPUs =
GPU_DEVICE_ORDINAL=/(CUDA|OCL)// CUDA_VISIBLE_DEVICES
ENVIRONMENT_VALUE_FOR_UnAssignedGPUs = 10000
MACHINE_RESOURCE_INVENTORY_GPUs =
$(LIBEXEC)/condor_gpu_discovery -properties
$(GPU_DISCOVERY_EXTRA)
SLOT_TYPE_1 = GPUs=1, CPUs=2
SLOT_WEIGHT = GPUs
START = (NODE_IS_HEALTHY =?= True) && (StartJobs
=?= True) && TARGET.RequestGpus &&
(RequestRuntime <= 12000)
STARTD_CRON_GPUs_MONITOR_EXECUTABLE =
$(LIBEXEC)/condor_gpu_utilization
STARTD_CRON_GPUs_MONITOR_METRICS = SUM:GPUs,
PEAK:GPUsMemory
STARTD_CRON_GPUs_MONITOR_MODE = WaitForExit
STARTD_CRON_GPUs_MONITOR_PERIOD = 1
STARTD_CRON_JOBLIST = NODEHEALTH GPUs_MONITOR GPUs_MONITOR
Best
Christoph
--
Christoph Beyer
DESY Hamburg
IT-Department
Notkestr. 85
Building 02b, Room 009
22607 Hamburg
phone:+49-(0)40-8998-2317
mail:
christoph.beyer@xxxxxxx
Hello
everyone,
I made a task where there was only "condor_gpu_discovery
-extra" and the output was only "DetectedGPUs = 0".
However, when I execute the command manually, it returns:
C: \> condor_gpu_discovery -extra
DetectedGPUs = "CUDA1"
CUDACapability = 1.2
CUDAClockMhz = 1402.00
CUDAComputeUnits = 2
CUDACoresPerCU = 8
CUDADeviceName = "GeForce 210"
CUDADevicePciBusId = "0000: 05: 00.0"
CUDADeviceUuid = "00000000-0000-0000-0000-000000000000"
CUDADriverVersion = 6.50
CUDAECCEnabled = false
CUDAGlobalMemoryMb = 1024
CUDARuntimeVersion = 10.20
So in the configuration context, condor_gpu_discovery does
not have access to any GPU information.
Best regards
Josef
On 2.4.2020 13:34, Josef
MitlÃhner wrote:
Hi,
lspci | grep -i nvidia
05:00.0 VGA compatible controller: NVIDIA Corporation
GT218 [GeForce 210] (rev a2)
C:\>condor_status -l mitlohner-w764 | grep -i gpu
DetectedGPUs = 0
GPUs = 0
MachineResources = "Cpus Memory Disk Swap GPUs"
TotalGPUs = 0
TotalSlotGPUs = 0
Best regards
Josef
On 2.4.2020 12:45, Beyer,
Christoph wrote:
hmm,
what does
lspci | grep -i nvidia
say ?
condor_Status should look somehow like this:
[root@batchg003 ~]# condor_status -l batchg003 |
grep -i gpu
AssignedGPUs = "CUDA0"
DetectedGPUs = 1
GPUs = 1
MachineResources = "Cpus Memory Disk Swap GPUs"
SlotWeight = GPUs
Start = (NODE_IS_HEALTHY =?= true) &&
(StartJobs =?= true) && TARGET.RequestGpus
&& (RequestRuntime <= 12000)
TotalGPUs = 1
TotalSlotGPUs = 1
[root@batchg003 ~]# condor_status -l batchg003 |
grep -i cuda
AssignedGPUs = "CUDA0"
CUDACapability = 6.1
CUDADeviceName = "GeForce GTX 1080 Ti"
CUDADevicePciBusId = "0000:65:00.0"
CUDADeviceUuid =
"3f2d719f-7d89-c75c-1a71-94316a2fcd12"
CUDADriverVersion = 10.2
CUDAECCEnabled = false
CUDAGlobalMemoryMb = 11178
Best
Christoph
--
Christoph Beyer
DESY Hamburg
IT-Department
Notkestr. 85
Building 02b, Room 009
22607 Hamburg
phone:+49-(0)40-8998-2317
mail:
christoph.beyer@xxxxxxx
Hi,
thank you for your reply.
The result is the same. The only change is (after
installing CUDA pagkage) in the
"condor_gpu_disovery -properties" listing:
DetectedGPUs="CUDA0"
CUDACapability=1.2
CUDADeviceName="GeForce 210"
CUDADevicePciBusId="0000:05:00.0"
CUDADeviceUuid="00000000-0000-0000-0000-000000000000"
CUDADriverVersion=6.50
CUDAECCEnabled=false
CUDAGlobalMemoryMb=1024
CUDARuntimeVersion=10.20
Thanks for help,
Best regards
Josef
On 2.4.2020 10:24,
Beyer, Christoph wrote:
Hi,
try
@use feature : GPUs
@use feature : GPUsMonitor
The second one is not mandatory of course
but you will want it ;)
install the cuda and nvidia-driver pkgs (I
think those cone with the cuda pkg though)
cuda.x86_64
Restart the host and check ...
Best
christoph
--
Christoph Beyer
DESY Hamburg
IT-Department
Notkestr. 85
Building 02b, Room 009
22607 Hamburg
phone:+49-(0)40-8998-2317
mail:
christoph.beyer@xxxxxxx
Hello,
when I run
the command "condor_gpu_discovery
-properties" on my computer it detects one
GPU
DetectedGPUs="CUDA0"
can't open SOFTWARE\NVIDIA Corporation\GPU
Computing Toolkit\CUDA
CUDACapability=1.2
CUDADeviceName="GeForce 210"
CUDADevicePciBusId="0000:05:00.0"
CUDADeviceUuid="00000000-0000-0000-0000-000000000000"
CUDADriverVersion=6.50
CUDAECCEnabled=false
CUDAGlobalMemoryMb=1024
In condor.config i have a line with "use
feature : GPUs"
Why does my HTCondor server say
(condor_status -l):
...
DetectedGPUs = 0
...
?
Thank you for reply
Josef
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/