I'm a little confused by your note of no operating system support. I
have indicated a reliable way of finding these processes, at least on
Linux. I now only seek some way of having Condor use this method, even
if it means wrapping the condor executables.
Mark is right - there really is no good OS support. All a process has
to do is fork twice and have the intermediate process exit. Then the
grandchild will be inherited by init. Condor's method of taking
snapshots of the process tree catches this...if it doesn't happen too
fast. The problem is, it frequently happens too fast.
Mike Yoder
Principal Member of Technical Staff
Direct : +1.408.321.9000
Fax : +1.408.904.5992
Mobile : +1.408.497.7597
yoderm@xxxxxxxxxx
Optena Corporation
2860 Zanker Road, Suite 201
San Jose, CA 95134
http://www.optena.com
-Jacob
Mark Silberstein wrote:
Unfortunately there's not too much you can do - Condor kill
mechanism is
as simple as sending kill to the process and to all its children.
Seems
OK, but the way Condor detects the children of the process is a bit
problematic, sincethere's no operating system support for this in
Linux.
So it samples the process tree periodically. If you are unlucky
enough
to issue condor_rm before Condor samples the process tree - too bad,
you've got runaway child.
The only thing I think you can do is to run Cron job on all your
machines which does this garbage collection.
On Fri, 2005-06-03 at 14:24 -0400, Jacob Joseph wrote:
As I mentioned, it does work to kill off the PGID. Since I can't
realistically expect all of my users to clean up whatever they might
spawn, I'm looking for a method on the Condor side of things that
guarantees all jobs started by a user will be killed. Can anyone
suggest a method of modifying condor's kill behavior?
-Jacob
Mark Silberstein wrote:
Hi
Let me correct my last mail - it's simply unbelievable.
I checked my own answer and was totally wrong. When bash script is
killed, it leaves its children alive. There are several threads on
this
in Google, and I was curious enough to check. Indeed, it is claimed
that
there's no simple solution to this problem.
So the only thing I would do is to trap EXIT in the script and kill
all
running processes. It does work for this simple snippet:
procname=sleep
clean(){
killall $procname
}
trap clean EXIT
for i in {1..10}; do
$procname 100
done
If you kill this script, sleep is killed.
Mark
On Fri, 2005-06-03 at 01:18 -0400, Jacob Joseph wrote:
Hi. I have a number of users who have taken to wrapping their
jobs
within shell scripts. Often, they'll use a for or while loop to
execute
a single command with various permutations. When such a job is
removed
with condor_rm, the main script is killed, but subprocesses
spawned
from
inside a loop will not be killed and will continue to run on the
compute
machine. This naturally interferes with jobs which are later
assigned
to that machine.
Does anyone know of a way to force bash subprocesses to be killed
along
with the parent upon removal with condor_rm? (This behavior is
not
unique to condor_rm. A kill to the parent also leaves the
subprocess
running.)
-Jacob
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users