Has anyone cooked up a good way to keep statistics on exec node outages? I’m looking for something comparable to the SLURM stat from sreport. I’ve got a couple of ideas, but I’m not really sure how they’d work or if they’d be efficient and reliable. One idea is a startd cron or schedd cron job to report the current time into a state file, and then update a “downtime” value when
a gap larger than the query interval appears there. However, I’m wondering if there is there an established way to create persistent machine classads without involving state files. Thanks for any ideas you might have. Michael V Pelletier Principal Engineer
Information Technology 50 Apple Hill Drive Tewksbury, MA 01876-1198
RTX.com
|
LinkedIn
|
Twitter
|
Instagram
|