[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] exit hook not always report correct ImageSize



To piggyback on this thread.

What units are all the memory classads in, binary or decimal?

Cheers,
Matt

On 27/01/2023 03:35, JM wrote:
CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.

Todd,

By the way, I noticed that ResidentSetSize from stdin is actually ResidentSetSize_RAW (not rounded up) during the life of the exit hook. It will be nice to be consistent everywhere and at any time.

Thanks!

J.

On Thu, Jan 26, 2023 at 9:15 PM JM <jm@xxxxxxxxxxxxxxxxxxxx> wrote:
Todd,

Reading from stdin works great. 

Regarding updating .job.ad upon job exit, sounds like a good idea to do so to make a consistent state of this file. 

Thank you.

J.

On Thu, Jan 26, 2023 at 4:51 PM Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
On 1/26/2023 1:25 PM, JM wrote:
HTCondor users,

I have an exit hook to send .job.ad to a database. However, I noticed that in some uncertain cases, ImageSize is 1250 instead of the real ImageSize from condor_history. The jobs run much longer than 15 seconds. I would expect startd will update .job.ad. I even tried to sleep 30 seconds in exit hook to make sure the update happens.

Does anyone have a clue why?

Hi,

I did not positively confirm this, but my guess is the .job.ad file sitting in the scratch directory is written at the start of job execution, and not re-written every time the job ad is updated. 

However, note that HTCondor will give a current/updated copy of the job classad to your exit hook script via stdin [*].  Instead of having your exit hook read the .job.ad file, I suggest you use the information passed to it via stdin.  Let us know if you have any additional problems or questions here.  It would not be a big deal for us to patch HTCondor to update the .job.ad upon job exit (i.e. before invoking the exit hook), but using the standard input should do what you want today....

Hope this helps,
Todd

[*] = In the manual at link:
   https://htcondor.readthedocs.io/en/latest/admin-manual/hooks.html#work-fetching-hooks-invoked-by-htcondor
look for "HOOK_JOB_EXIT" and note what it says in the section "Standard input given to the hook".



-- 
Todd Tannenbaum <tannenba@xxxxxxxxxxx>  University of Wisconsin-Madison
Center for High Throughput Computing    Department of Computer Sciences
Calendar: https://tinyurl.com/yd55mtgd  1210 W. Dayton St. Rm #4257

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

-- 
Matthew T. West
DevOps & HPC SysAdmin
University of Exeter, Research IT
www.exeter.ac.uk/research/researchcomputing/support/researchit
57 Laver Building, North Park Road, Exeter, EX4 4QE, United Kingdom

Please note, I may send emails out of 'normal' working hours, as this fits my own work-life balance. I do not expect a response outside of your own working hours.