HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] serious bug in head of 6_7-branch - EXCEPT() kills other daemons



On Tue, Jun 20, 2006 at 12:32:22PM -0500, Erik Paulson wrote:
> 
> A misconfigured quill++ hit an EXCEPT() this morning, but the strange
> thing was that all of the other daemons exited as well with SIGTERMS.

i looked into this with greg q and we figured out what is happening.

bottom line:  your mailer is misconfigured.  it's probably set to
/usr/bin/mail instead of /bin/mail.  (it seems to have moved with
the recent CentOS upgrades)

the bug has existed since the beginning of time, so we did not stop
the 6.7.20 release process.  we will fix it however before 6.8.0.

gritty details:
  when a daemon excepts, the master attempts to send an obituary
  for it.  in condor_util_lib/email.c:email_open_implementation,
  we execvp() the mailer.  if that fails, we EXCEPT.  but in this
  case, it is happening in the child of a fork(), so the error
  message gets lost, and the master's EXCEPT handler takes out all
  the children.

new plan:
  before the fork, test to see if the mailer is executable.  if
  not, print a message to the log and don't bother forking.  also,
  if the execvp() call fails, just exit instead of EXCEPTing.


cheers,
-zach