| Mailing List ArchivesAuthenticated access |  | ![[Computer Systems Lab]](http://www.cs.wisc.edu/pics/csl_logo.gif)  | 
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] pre-kill warning signals to jobs?
- Date: Wed, 20 Mar 2024 09:22:54 -0500
- From: Greg Thain <gthain@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] pre-kill warning signals to jobs?
On 3/20/24 03:35, Thomas Hartmann wrote:
Hi all,
a not fully fermented idea, but is there a way in Condor for the 
startd to send its job a signal on a predefined condition, e.g., for 
something like a warning when memory utilization is getting near to 
the requested limit?
Hi Thomas:
I like where you are going, but this may be hard to do with the tools we 
have today. Perhaps we need to ferment (and then even distill!) in 
order to get something useful to work.
Today, the startd can define a WANT_VACATE, and the job can define a 
custom soft-kill signal that will be first sent when WANT_VACATE is 
true. So, in theory, you could use these two to send some custom signal 
(SIGUSR1, maybe?). HOWEVER, a job can allocate memory very quickly, and 
there is a limit to how fast the startd sees the memory usage of the 
job. We'll still need a good way to notify the user. I wonder if there 
is a way to push the Jupyter notebook into it's own sub-cgroup of the 
job, and let the kernel kill the notebook when it goes over memory, 
leaving the parent job running to notify the user in some way?
-greg