Hi Angel,
After a quick code dive, it looks like DMAX is set to TMAX*3 with a hardcoded 86400 second minimum, and there is no way to override this behavior at the moment. Seems like a simple enough request.
Cheers,
Cole Bollig
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Angel de Vicente <angel.vicente.garrido@xxxxxxxxx>
Sent: Wednesday, January 18, 2023 3:02 AM To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx> Subject: [HTCondor-users] Being able to set the Gangliad DMAX value in the GangliaD metrics definitions Hello,
I've been configuring the integration HTCondor+Ganglia, and overall it works quite nicely, but I found an issue that should be easy to fix and would improve the quality of the graphs. The issue is due to the fact that the DMAX value injected to Ganglia by Gangliad is always 86400 (1 full day). As an example, for a test pool, I have right now: ,---- | $ condor_status | Name OpSys Arch State Activity LoadAv Mem ActvtyTime | | slot1@xxxxxxxxxx LINUX X86_64 Owner Idle 0.000 1957 0+00:00:00 | slot2@xxxxxxxxxx LINUX X86_64 Owner Idle 0.000 1957 0+00:00:00 | slot3@xxxxxxxxxx LINUX X86_64 Owner Idle 0.000 1957 0+00:00:00 | slot4@xxxxxxxxxx LINUX X86_64 Owner Idle 0.000 1957 0+00:00:00 | slot1@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1957 0+00:00:00 | slot2@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1957 0+00:30:00 | slot3@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1957 0+05:45:00 | slot4@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1957 0+05:45:00 | | Total Owner Claimed Unclaimed Matched Preempting Backfill Drain | | X86_64/LINUX 8 4 0 4 0 0 0 0 | | Total 8 4 0 4 0 0 0 0 `---- I have defined three metrics: "Owner", "CPUsInUse" and "CPUsNotInUse". Right now the pool only has 4 slots in the "owner" state and 4 slots not in use, but if I query the values for these metrics I get the following: ,---- | $ telnet localhost 8651 | grep -i 'name="owner' | <METRIC NAME="Owner" VAL="4" TYPE="int32" UNITS="" TN="30" TMAX="120" DMAX="86400" SLOPE="both" SOURCE="gmond"> | | $ telnet localhost 8651 | grep -i 'name="cpusinuse' | <METRIC NAME="CPUsInUse" VAL="2" TYPE="int32" UNITS="" TN="35237" TMAX="120" DMAX="86400" SLOPE="both" SOURCE="gmond"> | | $ telnet localhost 8651 | grep -i 'name="cpusnotinuse' | <METRIC NAME="CPUsNotInUse" VAL="4" TYPE="int32" UNITS="" TN="36" TMAX="120" DMAX="86400" SLOPE="both" SOURCE="gmond"> `---- "Owner" and "CPUsNotInUse" get valid and up-to-date values (TN is small), but since there are no CPUs in use by HTCondor and DMAX is a full day, this metric keeps the value from a few hours ago. For some metrics a large DMAX probably makes sense, but in this case I would like to have a small DMAX (perhaps 120). Being able to override the default DMAX value when defining a new metric would help in a situation like this. Is there already a way to do this? If not, perhaps I can submit a PR for it? Cheers, -- Ángel de Vicente -- (GPG: 0x64D9FDAE7CD5E939) Research Software Engineer (Supercomputing and BigData) Instituto de Astrofísica de Canarias (https://www.iac.es/en) _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/ |