Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Outage timekeeping?
- Date: Fri, 04 Jun 2021 14:45:01 -0500 (CDT)
- From: Todd L Miller <tlmiller@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Outage timekeeping?
However, I'm wondering if there is there an established way to create
persistent machine classads without involving state files.
Have you looked at OFFLINE and ABSENT ads in the collector?
Absent ads are written to disk and persist across restarts and
reboots. An ad becomes "absent", IIRC, when it would otherwise age out of
the collector.
(Ads age out of the collector every few minutes; I forget which
knob controls the rate. IIRC, if an ad isn't updated for three
consecutive update intervals, the collector throws it away, figuring that
Something Terrible has happened to either the daemon sendint it or the
network in between.)
The intended use of absent ads is to make it possible to check,
via the collector, which machines "should" be in your pool as opposed to
which ones actually are. (Obviously, if your pool is glide-ins, this is
mostly useless.) There's a knob you can use to determine which ads you
keep (e.g., you only want uptime numbers for startds you control).
The absent ad will contain the ad's usual attributes, including
the last update time, which will give you an approximation of how long the
machine has been down at the time that you checked. This won't be quite
the same number as downtime of the machine, or the downtime of the startd,
but since (generally speaking) a startd that's not in the collector can't
do useful work, it may be a number you care about, and close enough to
what you actually want.
Absent ads also eventually age out, and are also removed when an
update for the same ad arrives.
- ToddM