Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] pre-kill warning signals to jobs?
- Date: Wed, 20 Mar 2024 09:22:54 -0500
- From: Greg Thain <gthain@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] pre-kill warning signals to jobs?
On 3/20/24 03:35, Thomas Hartmann wrote:
Hi all,
a not fully fermented idea, but is there a way in Condor for the
startd to send its job a signal on a predefined condition, e.g., for
something like a warning when memory utilization is getting near to
the requested limit?
Hi Thomas:
I like where you are going, but this may be hard to do with the tools we
have today. Perhaps we need to ferment (and then even distill!) in
order to get something useful to work.
Today, the startd can define a WANT_VACATE, and the job can define a
custom soft-kill signal that will be first sent when WANT_VACATE is
true. So, in theory, you could use these two to send some custom signal
(SIGUSR1, maybe?). HOWEVER, a job can allocate memory very quickly, and
there is a limit to how fast the startd sees the memory usage of the
job. We'll still need a good way to notify the user. I wonder if there
is a way to push the Jupyter notebook into it's own sub-cgroup of the
job, and let the kernel kill the notebook when it goes over memory,
leaving the parent job running to notify the user in some way?
-greg