On 1/26/2023 9:35 PM, JM wrote:
Todd,
By the way, I noticed that ResidentSetSize from stdin is
actuallyÂResidentSetSize_RAW (not rounded up) during the life
of the exit hook. It will be nice to be consistent everywhere
and at any time.
Thanks!
J.
Hi J,
Glad I could be of assistance!Â
Unfortunately, from a scalability standpoint, having data
consistency everywhere all the time is not feasible in a distributed
system like HTCondor, the best we can hope for is eventual
consistency between what all the components "see" (starter, startd,
shadow, schedd, collector, etc).Â
As for attribute naming consistency, I totally agree with you. Some
background here re the naming of job attributes like ResidentSetSize
-vs- ResidentSetSize_RAW: the issue here is the condor_schedd is the
component that rounds up the ResidentSetSize_RAW value into bucketed
values to batch up matchmaking requests [*]Â For example, when the
schedd is requesting resources from the central manager, this
rounding allows the schedd wants to make one request like "give me
50 machines with 2GB RAM", instead of making 50 separate requests
like "give me one machine with 1.87GB RAM, and another machine with
1.89GB RAM, ...". In hindsight, the raw uncooked value should have
just been named "ResidentSetSize" everywhere, and the rounded value
something else like "ResidentSetSize_Rounded" just in the job ad
inside the schedd, but changing it retroactively in a manner that
maintains backwards-compatibility everywhere is not trivial....Â
Hope the above helps,
Todd
Todd,
Reading from stdin works great.Â
Regarding updating .
job.ad upon
job exit, soundsÂlike a good idea to do so to make a
consistent state of this file.Â
Thank you.
J.
On 1/26/2023 1:25 PM, JM wrote:
HTCondor users,
I have an exit hook to send .
job.ad to a database.
However, I noticed that in some uncertain cases,
ImageSize is 1250 instead of the real ImageSize
from condor_history. The jobs run much longer than
15 seconds. I would expect startd will update .
job.ad. I even tried
to sleep 30 seconds in exit hook to make sure the
update happens.
Does anyone have a clue why?
Hi,
I did not positively confirm this, but my guess is the .
job.ad file sitting in the
scratch directory is written at the start of job
execution, and not re-written every time the job ad is
updated.Â
However, note that HTCondor will give a current/updated
copy of the job classad to your exit hook script via
stdin [*]. Instead of having your exit hook read the .
job.ad file, I suggest you
use the information passed to it via stdin. Let us know
if you have any additional problems or questions here.Â
It would not be a big deal for us to patch HTCondor to
update the .
job.ad upon job exit (i.e.
before invoking the exit hook), but using the standard
input should do what you want today....
Hope this helps,
Todd
[*] = In the manual at link:
ÂÂ
https://htcondor.readthedocs.io/en/latest/admin-manual/hooks.html#work-fetching-hooks-invoked-by-htcondor
look for "HOOK_JOB_EXIT" and note what it says in the
section "Standard input given to the hook".
--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing Department of Computer Sciences
Calendar: https://tinyurl.com/yd55mtgd 1210 W. Dayton St. Rm #4257
--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing Department of Computer Sciences
Calendar: https://tinyurl.com/yd55mtgd 1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132 Madison, WI 53706-1685