Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] CondorMaster restarting sometime
- Date: Wed, 20 Jan 2010 12:23:41 -0600
- From: Nick LeRoy <nleroy@xxxxxxxxxxx>
- Subject: Re: [Condor-users] CondorMaster restarting sometime
On Wednesday 20 January 2010, Henning Fehrmann wrote:
> Hello,
>
> we have condor 7.4.1 running and observed that on the nodes running a
> startd the condor_master process is stopping with exit code 0 and starting
> from time to time. This happens on arbitrary nodes at arbitrary time. We
> have not been able yet to correlate this with a particular kind of jobs.
> We increased the verbosity on some nodes and collected the logs.
>
> I took the time around such an event and put the CKPTLog, MasterLog and
> StartLog of the startd node and the CollectorLog of the submit host into
> a tar ball:
>
> http://atlas1.atlas.aei.uni-hannover.de/~fehrmann/condor_log.tgz
>
> Unfortunately, we have been too slow - the log rotate erased the
> corresponding events in the StarterLogs.
>
> If you need the configuration or more logging please tell us.
I see this in the Master's log that's suspicious... The master got a SIGTERM
and did what it's supposed to. It's not at all clear as to why it's getting
the SIGTERM, however...
01/18 18:34:01 (fd:8) (pid:9465) DaemonCore: received Signal 15 (SIGTERM),
raising event handle_dc_sigterm()
Earlier in the log there's this, but I think that it's a DAEMON_OFF_PEACEFUL
to the startd (which the master then sends a TERM to).
01/18 18:33:52 (fd:9) (pid:9465) Received TCP command 483
(DAEMON_OFF_PEACEFUL) from <10.10.1.74:56227>, access level ADMINISTRATOR
I'd look around on the system and see what could be sending a TERM to the
master.
-Nick
--
<<< Why, oh, why, didn't I take the blue pill? >>>
/`-_ Nicholas R. LeRoy The Condor Project
{ }/ http://www.cs.wisc.edu/~nleroy http://www.cs.wisc.edu/condor
\ / nleroy@xxxxxxxxxxx The University of Wisconsin
|_*_| 608-265-5761 Department of Computer Sciences