Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Unable to unset monitors in claim destructor. The StartOfJob* attributes will be stale.
- Date: Tue, 20 Aug 2019 16:44:51 -0500 (CDT)
- From: Todd L Miller <tlmiller@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Unable to unset monitors in claim destructor. The StartOfJob* attributes will be stale.
08/18/19 15:26:23 slot2_1: Unable to unset monitors in claim destructor.
The StartOfJob* attributes will be stale. (0x1d215c0, (nil))
If GPU monitoring is turned on, GPU usage information is recorded
in the slot ad, and assigned to a job as it runs on that slot. When a job
starts, we record the slot's current usage in the slot ad; then we compute
the job's usage by substracting this from the ongoing accumulation of
usage, until the job ends. Of course, it the claim is deleted, we need to
make sure that the information we recorded about the start of the job is
deleted, too; otherwise, the slot will report usage for a job that's no
longer running. (This won't screw up accounting, because that only counts
assigned resources, not actual usage.)
However, in some cases, a claim will be deleted whose ClassAd has
already been deleted. In those cases, we can't (presently) determine
which monitors to unset, and so we do nothing. This _should_ only happen
when the slot is being destroyed, in which case it's harmless, but I've
been unable to prove that's the case.
However, in the course of refreshing my memory about this, Jaime
found a place in the code where a one-line change might substantially
reduce the occurrence of these warnings; we'll see how that goes.
- ToddM