Hello,
What youâd want to do is set up a startd cron job. The ClassAd output from this is pulled into the Machine ClassAd and this becomes queriable by condor_status.Â
I do something similar with a job that calls ipmitool to check the power and cooling status of the machine and set a PowerOrCoolingFault Boolean attribute, allowing it to reject jobs if a PSU or fan fault is flagged.
You can set the interval for startd cron jobs in the configuration. Bear in mind that the collector is only updated periodically so a higher frequency doesnât gain you anything. I think itâs possible to push updates immediately from startd cron, but youâd want to keep an eye on the collector load in that case if you have a lot of machines.Â
-Michael Pelletier.ÂGet Outlook for iOS
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Benedikt Riedel <briedel@xxxxxxxxxxxxxxxx>
Sent: Wednesday, March 20, 2024 5:08:58 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [External] [HTCondor-users] Additional GPU statisticsÂ_______________________________________________Hi,
Is there a way to get additional GPU statistics like the power draw through condor? Is there a way to increase the query rate for GPU statistics from HTCondor?
Thanks,
Benedikt
--
Benedikt RiedelGlobal Computing Coordinator IceCube Neutrino ObservatoryTechnical Coordinator IceCube Neutrino ObservatoryComputing Manager Wisconsin IceCube Particle Astrophysics CenterUniversity of Wisconsin-Madison
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/