Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM
- Date: Wed, 19 Dec 2007 11:58:06 -0600
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM
Rob de Graaf wrote:
If we can't "catch" jobs that are being killed outside condor, I suppose
the only way is to re-queue them after reviewing the logs with non-zero
return values?
Course the worry there is what if your job actually exits with non-zero?
Another idea is to ask Condor to rerun the job if it is killed with a
sigterm or a sigquit signal. Seems unlikely that a job would exit on
its own accord with either of those signals.
Off the top of my head, I think you could do the above by placing the
following in your condor submit file:
on_exit_remove = (ExitBySignal == False) ||
((ExitSignal != 3) && (ExitSignal != 15))
hope this is helpful,
Todd