[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Why does the Starter deamon keeps dying with SEGV?



On Jul 17, 2006, at 2:15 AM, Mark Calleja wrote:

We're running a linux pool with Condor 6.6.11 and we persistently see a number of vanilla jobs whose Starter keeps dying with (from the StartLog):

7/17 08:07:03 Starter pid 16900 died on signal 11 (signal 11)
7/17 08:07:03 vm1: State change: starter exited

The StarterLog shows nothing, even with full debug turned on. The jobs
then keep resubmitting themselves to die a similar death. As far as I
can tell this is the daemon itself dying, not the application that its
running (which runs fine from the console). We're using the dynamically linked binaries under Debian "etch". Can anyone shed any light why this
should be happening, and more importantly how we can fix it?

What does the starter log say around the time of the segfault?
Are there any core files in the condor log directory?

+--------------------------------+-----------------------------------+
|           Jaime Frey           | I used to be a heavy gambler.     |
|       jfrey@xxxxxxxxxxx        | But now I just make mental bets.  |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind.        |
+--------------------------------+-----------------------------------+