Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] GPU monitoring vanished in my pool :(
- Date: Tue, 05 May 2020 11:37:57 +0200 (CEST)
- From: "Beyer, Christoph" <christoph.beyer@xxxxxxx>
- Subject: [HTCondor-users] GPU monitoring vanished in my pool :(
Hi,
I do use the to GPU features on my GPU nodes:
[root@batchg003 ~]# condor_config_val use feature:GPUs
use FEATURE:GPUs is
MACHINE_RESOURCE_INVENTORY_GPUs=$(LIBEXEC)/condor_gpu_discovery -properties $(GPU_DISCOVERY_EXTRA)
ENVIRONMENT_FOR_AssignedGPUs=GPU_DEVICE_ORDINAL=/(CUDA|OCL)// CUDA_VISIBLE_DEVICES
ENVIRONMENT_VALUE_FOR_UnAssignedGPUs=10000
use feature : GPUsMonitor
[root@batchg003 ~]# condor_config_val use feature:GPUsMonitor
use FEATURE:GPUsMonitor is
use feature : Monitor( GPUs, WaitForExit, 1, $(LIBEXEC)/condor_gpu_utilization, SUM:GPUs, PEAK:GPUsMemory )
And in the past I was able for a while to check the results in the memory of a job, like this:
condor_history 11262904 -af:l GPUsMemoryUsage GPUsProvisioned GPUsUsage
> GPUsMemoryUsage = 29261.0 GPUsProvisioned = 4 GPUsUsage = 3.688929331491713
(given the job used any GPUs of course) This has vanished from my history unfortunately without any changes been made (at least no changes by intention I might want to say).
I use 8.9.3 on the gpu nodes and 8.9.1 on the sched but that should not explain it - right ?
Best
christoph
--
Christoph Beyer
DESY Hamburg
IT-Department
Notkestr. 85
Building 02b, Room 009
22607 Hamburg
phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx