Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] condor_rm not killing subprocesses
- Date: Fri, 03 Jun 2005 14:01:11 -0400
- From: Jacob Joseph <jmjoseph@xxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] condor_rm not killing subprocesses
I thought I might include a quick example of the bash behavior I
mention. Run test3.sh then send various signals to it. You'll see they
are not received by test2.sh. test2.sh will continue until receiving a
SIGKILL itself.
-Jacob
------------------------------------------
$ cat test2.sh
#!/bin/bash
trap trap_int INT
trap trap_hup HUP
trap trap_term TERM
trap_int() { echo int; }
trap_hup() { echo hup; }
trap_term() { echo term; }
while (( 1 )); do true; done
------------------------------------------
$ cat test3.sh
#!/bin/bash
for x in "0"; do
./test2.sh
done
------------------------------------------
Jacob Joseph wrote:
> Thanks for the reply. I'm not sure it solves my troubles though. Does
> condor send a SIGTERM only to the parent bash process it spawned? If
> so, I can reproduce the behavior outside of condor by simply killing
> (SIGTERM) the bash script. Bash does not forward this signal to
> processes started from within a loop. I believe the correct terminology
> is that it is no longer the controlling shell. The end result is that
> Condor never ends up getting a signal to the subprocess, which continues
> running.
>
> What does work is to send a kill to all processes in the same process
> group ID. (kill does this with a negative <pgid> argument). Is there a
> way to have condor do this as well? Can condor be modified? Can condor
> spawn my own script to accomplish this?
>
> -Jacob
>
> Mark Silberstein wrote:
>
>>It seems that your condor setup doesn't give a time to a program to
>>finish nicely when condor is evicting it - look at KILL expression.
>>Usually Condor first tries to kill with SIGTERM, and then when KILL
>>expression is true - it will kill with -9. It seems that bash doesn't
>>have a chance to clean up all its processes, which it does when you kill
>>with Ctl-C.
>>You may also want to specify kill_sig=SIGQUIT, which will cause Condor
>>to kill it with SIGQUIT first.
>>
>>
>>
>>On Fri, 2005-06-03 at 01:18 -0400, Jacob Joseph wrote:
>>
>>
>>>Hi. I have a number of users who have taken to wrapping their jobs
>>>within shell scripts. Often, they'll use a for or while loop to execute
>>>a single command with various permutations. When such a job is removed
>>>with condor_rm, the main script is killed, but subprocesses spawned from
>>>inside a loop will not be killed and will continue to run on the compute
>>>machine. This naturally interferes with jobs which are later assigned
>>>to that machine.
>>>
>>>Does anyone know of a way to force bash subprocesses to be killed along
>>>with the parent upon removal with condor_rm? (This behavior is not
>>>unique to condor_rm. A kill to the parent also leaves the subprocess
>>>running.)
>>>
>>>-Jacob
>>>_______________________________________________
>>>Condor-users mailing list
>>>Condor-users@xxxxxxxxxxx
>>>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>>
>>_______________________________________________
>>Condor-users mailing list
>>Condor-users@xxxxxxxxxxx
>>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users