[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-devel] serious bug in head of 6_7-branch - EXCEPT() kills other daemons
- Date: Wed, 21 Jun 2006 15:58:53 -0500
- From: Zachary Miller <zmiller@xxxxxxxxxxx>
- Subject: Re: [Condor-devel] serious bug in head of 6_7-branch - EXCEPT() kills other daemons
On Tue, Jun 20, 2006 at 12:32:22PM -0500, Erik Paulson wrote:
>
> A misconfigured quill++ hit an EXCEPT() this morning, but the strange
> thing was that all of the other daemons exited as well with SIGTERMS.
i looked into this with greg q and we figured out what is happening.
bottom line: your mailer is misconfigured. it's probably set to
/usr/bin/mail instead of /bin/mail. (it seems to have moved with
the recent CentOS upgrades)
the bug has existed since the beginning of time, so we did not stop
the 6.7.20 release process. we will fix it however before 6.8.0.
gritty details:
when a daemon excepts, the master attempts to send an obituary
for it. in condor_util_lib/email.c:email_open_implementation,
we execvp() the mailer. if that fails, we EXCEPT. but in this
case, it is happening in the child of a fork(), so the error
message gets lost, and the master's EXCEPT handler takes out all
the children.
new plan:
before the fork, test to see if the mailer is executable. if
not, print a message to the log and don't bother forking. also,
if the execvp() call fails, just exit instead of EXCEPTing.
cheers,
-zach