there sems to be a little something missing somewhere ;)
I had similar problems when we started to use GPUs, the
cause was an individual configuration overwriting the
feature config.
What does condor_config_val say, it should look somehow
similar to this:
[root@batchg003 ~]# condor_config_val -dump | grep -i
gpu
ENVIRONMENT_FOR_AssignedGPUs =
GPU_DEVICE_ORDINAL=/(CUDA|OCL)//Â CUDA_VISIBLE_DEVICES
ENVIRONMENT_VALUE_FOR_UnAssignedGPUs = 10000
MACHINE_RESOURCE_INVENTORY_GPUs =
$(LIBEXEC)/condor_gpu_discovery -properties
$(GPU_DISCOVERY_EXTRA)
SLOT_TYPE_1 = GPUs=1, CPUs=2
SLOT_WEIGHT = GPUs
START = (NODE_IS_HEALTHY =?= True) && (StartJobs
=?= True) && TARGET.RequestGpus &&
(RequestRuntime <= 12000)
STARTD_CRON_GPUs_MONITOR_EXECUTABLE =
$(LIBEXEC)/condor_gpu_utilization
STARTD_CRON_GPUs_MONITOR_METRICS = SUM:GPUs,
PEAK:GPUsMemory
STARTD_CRON_GPUs_MONITOR_MODE = WaitForExit
STARTD_CRON_GPUs_MONITOR_PERIOD = 1
STARTD_CRON_JOBLIST = NODEHEALTH GPUs_MONITOR GPUs_MONITOR
Best
Christoph
--
Christoph Beyer
DESY Hamburg
IT-Department
Notkestr. 85
Building 02b, Room 009
22607 Hamburg
phone:+49-(0)40-8998-2317
mail:
christoph.beyer@xxxxxxx
Hello
everyone,
I made a task where there was only "condor_gpu_discovery
-extra" and the output was only "DetectedGPUs = 0".
However, when I execute the command manually, it returns:
ÂC: \> condor_gpu_discovery -extra
DetectedGPUs = "CUDA1"
CUDACapability = 1.2
CUDAClockMhz = 1402.00
CUDAComputeUnits = 2
CUDACoresPerCU = 8
CUDADeviceName = "GeForce 210"
CUDADevicePciBusId = "0000: 05: 00.0"
CUDADeviceUuid = "00000000-0000-0000-0000-000000000000"
CUDADriverVersion = 6.50
CUDAECCEnabled = false
CUDAGlobalMemoryMb = 1024
CUDARuntimeVersion = 10.20
So in the configuration context, condor_gpu_discovery does
not have access to any GPU information.
Best regards
Josef
On 2.4.2020 13:34, Josef
MitlÃhner wrote:
Hi,
lspci | grep -i nvidia
05:00.0 VGA compatible controller: NVIDIA Corporation
GT218 [GeForce 210] (rev a2)
C:\>condor_status -l mitlohner-w764 | grep -i gpu
DetectedGPUs = 0
GPUs = 0
MachineResources = "Cpus Memory Disk Swap GPUs"
TotalGPUs = 0
TotalSlotGPUs = 0
Best regards
Josef
On 2.4.2020 12:45, Beyer,
Christoph wrote:
hmm,
what does
lspci | grep -i nvidia
say ?
condor_Status should look somehow like this:
[root@batchg003 ~]# condor_status -l batchg003 |
grep -i gpu
AssignedGPUs = "CUDA0"
DetectedGPUs = 1
GPUs = 1
MachineResources = "Cpus Memory Disk Swap GPUs"
SlotWeight = GPUs
Start = (NODE_IS_HEALTHY =?= true) &&
(StartJobs =?= true) && TARGET.RequestGpus
&& (RequestRuntime <= 12000)
TotalGPUs = 1
TotalSlotGPUs = 1
[root@batchg003 ~]# condor_status -l batchg003 |
grep -i cuda
AssignedGPUs = "CUDA0"
CUDACapability = 6.1
CUDADeviceName = "GeForce GTX 1080 Ti"
CUDADevicePciBusId = "0000:65:00.0"
CUDADeviceUuid =
"3f2d719f-7d89-c75c-1a71-94316a2fcd12"
CUDADriverVersion = 10.2
CUDAECCEnabled = false
CUDAGlobalMemoryMb = 11178
Best
Christoph
--
Christoph Beyer
DESY Hamburg
IT-Department
Notkestr. 85
Building 02b, Room 009
22607 Hamburg
phone:+49-(0)40-8998-2317
mail:
christoph.beyer@xxxxxxx
Hi,
thank you for your reply.
The result is the same. The only change is (after
installing CUDA pagkage) in the
"condor_gpu_disovery -properties" listing:
DetectedGPUs="CUDA0"
CUDACapability=1.2
CUDADeviceName="GeForce 210"
CUDADevicePciBusId="0000:05:00.0"
CUDADeviceUuid="00000000-0000-0000-0000-000000000000"
CUDADriverVersion=6.50
CUDAECCEnabled=false
CUDAGlobalMemoryMb=1024
CUDARuntimeVersion=10.20
Thanks for help,
Best regards
Josef
On 2.4.2020 10:24,
Beyer, Christoph wrote:
Hi,
try
@use feature : GPUs
@use feature : GPUsMonitor
The second one is not mandatory of course
but you will want it ;)
install the cuda and nvidia-driver pkgs (I
think those cone with the cuda pkg though)
cuda.x86_64
Restart the host and check ...
Best
christoph
--
Christoph Beyer
DESY Hamburg
IT-Department
Notkestr. 85
Building 02b, Room 009
22607 Hamburg
phone:+49-(0)40-8998-2317
mail:
christoph.beyer@xxxxxxx
Hello,
when I run
the command "condor_gpu_discovery
-properties" on my computer it detects one
GPU
DetectedGPUs="CUDA0"
can't open SOFTWARE\NVIDIA Corporation\GPU
Computing Toolkit\CUDA
CUDACapability=1.2
CUDADeviceName="GeForce 210"
CUDADevicePciBusId="0000:05:00.0"
CUDADeviceUuid="00000000-0000-0000-0000-000000000000"
CUDADriverVersion=6.50
CUDAECCEnabled=false
CUDAGlobalMemoryMb=1024
In condor.config i have a line with "use
feature : GPUs"
Why does my HTCondor server say
(condor_status -l):
...
DetectedGPUs = 0
...
?
Thank you for reply
Josef
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/