[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] (no subject)



Hi,
I found a message posted way back in condor-users group by Bruce Beckles.. but no body seemed to have replied to these questions. I have similar queries
in my mind that I need to use for a user job wrapper that I have written. If anybody could put some light on any of these questions.. it will be helpful for me big time.
The questions are as follows -

"- If a user uses the kill_sig command in their submit description
file, does Condor (a) check the value given to ensure it is a valid
signal, and (b) restrict that value in any way (for instance, it doesn't
make sense for it to be SIGSTOP (23))?

- Scouring the manual I've discovered the following settings that affect
how long Condor will wait before escalating its attempt to stop the
job/its daemons:
KILLING_TIMEOUT: length of time after starting to vacate job
before a SIGKILL is sent
SHUTDOWN_FAST_TIMEOUT: length of time daemons are given to perform
a fast shutdown before they are killed
outright
SHUTDOWN_GRACEFUL_TIMEOUT: length of time daemons are given to do
a graceful shutdown before they do a
hard shutdown
Are there any other settings affecting this area that I've missed?
What constitutes a "hard shutdown" in this context? Is it just sending
SIGKILL?


- The example init boot script included in the Condor distribution sends a
SIGQUIT to the condor_master to initiate shutdown of Condor. The
comments in this script say:
# send SIGQUIT to the condor_master, which initiates its fast
# shutdown method. The master itself will start sending
# SIGKILL to all it's children if they're not gone in 20
# seconds.

Is this interval of 20 seconds correct (the comments at the top of the
script are dated 1998, so it may have changed since then)? Is this
interval hard-coded, or can it be changed? If it can be changed, how?


- The SIGQUIT, SIGHUP and SIGTERM are all handled by the DaemonCore
library, and so presumably might be sent by a Condor process to a Condor
daemon. Are SIGHUP and SIGQUIT ever sent by Condor to any processes
which are _not_ Condor daemons?


- Condor detects if the job exits via a signal. Suppose my job (J) is
actually just a wrapper for some other program/shell script (P).
Suppose that after spawning P, J just waits for P to terminate and then
exits. IF P exits via a signal, will Condor regard that as the job
exiting via a signal, or will it regard it as "normal termination" (as J
has exited "normally")?


- In the Vanilla, Java, MPI, PVM and Scheduler universes, when Condor
vacates the job gracefully be sending it a SIGTERM (or whatever the
KillSig ClassAd attribute has been set to), does it send this signal
just to the immediate child of the condor_starter, or to all the
processes (if any) spawned by that child as well? "


Thanks,
Tan

--
--
Tanzima Zerin Islam
Graduate Student
School of Electrical & Computer Engineering
Purdue University