[HTCondor-users] Outage timekeeping?

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Date: Thu, 03 Jun 2021 20:45:43 +0000

From: Michael Pelletier <michael.v.pelletier@xxxxxxxxxxxx>

Subject: [HTCondor-users] Outage timekeeping?

Has anyone cooked up a good way to keep statistics on exec node outages? I’m looking for something comparable to the SLURM stat from sreport.

I’ve got a couple of ideas, but I’m not really sure how they’d work or if they’d be efficient and reliable. One idea is a startd cron or schedd cron job to report the current time into a state file, and then update a “downtime” value when a gap larger than the query interval appears there.

However, I’m wondering if there is there an established way to create persistent machine classads without involving state files.

Thanks for any ideas you might have.

Michael V Pelletier

Principal Engineer

C: +1 339.293.9149
michael.v.pelletier@xxxxxxx

Raytheon Technologies

Information Technology

50 Apple Hill Drive

Tewksbury, MA 01876-1198

RTX.com | LinkedIn | Twitter | Instagram

Mailing List Archives

Authenticated access

[HTCondor-users] Outage timekeeping?