Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] USER_JOB_WRAPPER and Unix signals
- Date: Mon, 16 Aug 2004 09:35:31 +0100 (BST)
- From: Bruce Beckles <mbb10@xxxxxxxxx>
- Subject: Re: [Condor-users] USER_JOB_WRAPPER and Unix signals
On Wed, 11 Aug 2004, Dan Bradley wrote:
<snip>
Thanks, Dan for your comments - I have a few more questions on Condor's
use of Unix/Linux signals which I hope you or someone else can help me
with:
- If a user uses the kill_sig command in their submit description
file, does Condor (a) check the value given to ensure it is a valid
signal, and (b) restrict that value in any way (for instance, it doesn't
make sense for it to be SIGSTOP (23))?
- Scouring the manual I've discovered the following settings that affect
how long Condor will wait before escalating its attempt to stop the
job/its daemons:
KILLING_TIMEOUT: length of time after starting to vacate job
before a SIGKILL is sent
SHUTDOWN_FAST_TIMEOUT: length of time daemons are given to perform
a fast shutdown before they are killed
outright
SHUTDOWN_GRACEFUL_TIMEOUT: length of time daemons are given to do
a graceful shutdown before they do a
hard shutdown
Are there any other settings affecting this area that I've missed?
What constitutes a "hard shutdown" in this context? Is it just sending
SIGKILL?
- The example init boot script included in the Condor distribution sends a
SIGQUIT to the condor_master to initiate shutdown of Condor. The
comments in this script say:
# send SIGQUIT to the condor_master, which initiates its fast
# shutdown method. The master itself will start sending
# SIGKILL to all it's children if they're not gone in 20
# seconds.
Is this interval of 20 seconds correct (the comments at the top of the
script are dated 1998, so it may have changed since then)? Is this
interval hard-coded, or can it be changed? If it can be changed, how?
- The SIGQUIT, SIGHUP and SIGTERM are all handled by the DaemonCore
library, and so presumably might be sent by a Condor process to a Condor
daemon. Are SIGHUP and SIGQUIT ever sent by Condor to any processes
which are _not_ Condor daemons?
- Condor detects if the job exits via a signal. Suppose my job (J) is
actually just a wrapper for some other program/shell script (P).
Suppose that after spawning P, J just waits for P to terminate and then
exits. IF P exits via a signal, will Condor regard that as the job
exiting via a signal, or will it regard it as "normal termination" (as J
has exited "normally")?
- In the Vanilla, Java, MPI, PVM and Scheduler universes, when Condor
vacates the job gracefully be sending it a SIGTERM (or whatever the
KillSig ClassAd attribute has been set to), does it send this signal
just to the immediate child of the condor_starter, or to all the
processes (if any) spawned by that child as well?
> Bruce Beckles wrote:
<snip>
> >...and so I need to know what signals Condor will send to the user job -
> >trawling the manual seems to reveal the following:
> >
> >- SIGUSR2:
> > cause a job in the Standard universe to checkpoint and then continue
> > executing.
> >
> >- SIGTSTP (or the value of the KillSig ClassAd attribute):
> > cause a job in the Standard universe to try and gracefully shutdown
> > (i.e. checkpoint).
> >
> >- SIGTERM (or the value of the KillSig ClassAd attribute):
> > cause a job in the Vanilla universe to try and gracefully shutdown,
> > i.e. normal Unix termination (noting that the program may catch
> > SIGTERM and try to clean up). Is this also true for jobs in the other
> > non-Standard (Java, MPI, PVM and Scheduler) universes?
> >
> >- SIGKILL:
> > kill (i.e. send the hard-kill signal to) the job, if the job takes too
> > long to gracefully shutdown or doesn't respond to the appropriate
> > signal.
<snip>
Apart from SIGSTOP/SIGCONT for suspending/continuing a job, are there any
other signals I missed? (Obviously the user can set the KillSig ClassAd
attribute to a signal I've not listed above...)
Any answers/information gratefully received!
Thanks,
Bruce
--
Bruce Beckles,
e-Science Specialist,
University of Cambridge Computing Service.