Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] STARTD died due to exception ACCESS_VIOLATION
- Date: Wed, 19 Dec 2007 14:27:26 -0800
- From: "Finch, Ralph" <rfinch@xxxxxxxxxxxx>
- Subject: [Condor-users] STARTD died due to exception ACCESS_VIOLATION
condor -version
$CondorVersion: 6.8.3 Jan 5 2007 $
$CondorPlatform: INTEL-WINNT50 $
A quadcore machine which had been happily running condor and submitted
jobs
started to fail this morning. MasterLog, StartLog, and core files are
below.
The only changes I made to the machine were to install Microsoft .NET
framework v2.0 and to try running an MPI client (FAH SMP Beta Client
http://folding.stanford.edu/English/FAQ-SMP#ntoc2).
MasterLog:
12/19 14:05:03 ** Condor (CONDOR_MASTER) STARTING UP
12/19 14:05:03 ** Z:\condor\bin\condor_master.exe
12/19 14:05:03 ** $CondorVersion: 6.8.3 Jan 5 2007 $
12/19 14:05:03 ** $CondorPlatform: INTEL-WINNT50 $
12/19 14:05:03 ** PID = 2192
12/19 14:05:03 ** Log last touched 12/19 14:02:29
12/19 14:05:03 ******************************************************
12/19 14:05:03 Using config source: Z:\condor\condor_config
12/19 14:05:03 Using local config sources:
12/19 14:05:03 Z:/Condor/condor_config.local
12/19 14:05:03 DaemonCore: Command Socket at <136.200.32.87:2043>
12/19 14:05:03 Started DaemonCore process
"Z:/Condor/condor-6.8.3/bin/condor_startd.exe", pid and pgroup = 4036
12/19 14:05:03 Started DaemonCore process
"Z:/Condor/condor-6.8.3/bin/condor_schedd.exe", pid and pgroup = 4068
12/19 14:07:07 DaemonCore: Command received via UDP from host
<136.200.32.87:2069>
12/19 14:07:07 DaemonCore: received command 60011 (DC_NOP), calling
handler (handle_nop())
12/19 14:07:07 The STARTD (pid 4036) died due to exception
ACCESS_VIOLATION
12/19 14:07:07 Sending obituary for
"Z:/Condor/condor-6.8.3/bin/condor_startd.exe"
12/19 14:07:07 restarting Z:/Condor/condor-6.8.3/bin/condor_startd.exe
in 10 seconds
12/19 14:07:17 Started DaemonCore process
"Z:/Condor/condor-6.8.3/bin/condor_startd.exe", pid and pgroup = 3196
12/19 14:09:28 DaemonCore: Command received via UDP from host
<136.200.32.87:2208>
12/19 14:09:28 DaemonCore: received command 60011 (DC_NOP), calling
handler (handle_nop())
12/19 14:09:28 The STARTD (pid 3196) died due to exception
ACCESS_VIOLATION
12/19 14:09:28 Sending obituary for
"Z:/Condor/condor-6.8.3/bin/condor_startd.exe"
12/19 14:09:28 restarting Z:/Condor/condor-6.8.3/bin/condor_startd.exe
in 11 seconds
StartLog:
12/19 14:20:38 vm2: State change: IS_OWNER is false
12/19 14:20:38 vm2: Changing state: Owner -> Unclaimed
12/19 14:20:38 vm2: State change: IS_OWNER is TRUE
12/19 14:20:38 vm2: Changing state: Unclaimed -> Owner
12/19 14:20:38 vm2: State change: IS_OWNER is false
12/19 14:20:38 vm2: Changing state: Owner -> Unclaimed
12/19 14:20:38 vm2: State change: IS_OWNER is TRUE
12/19 14:20:38 vm2: Changing state: Unclaimed -> Owner
12/19 14:20:38 vm2: State change: IS_OWNER is false
12/19 14:20:38 vm2: Changing state: Owner -> Unclaimed
12/19 14:20:38 vm2: State change: IS_OWNER is TRUE
12/19 14:20:38 vm2: Changing state: Unclaimed -> Owner
core.STARTD.WIN32:
//=====================================================
Exception code: C00000FD STACK_OVERFLOW
Fault address: 7C809C0B 01:00008C0B C:\WINDOWS\system32\kernel32.dll
Registers:
EAX:00003F15
EBX:00000000
ECX:00000000
EDX:FFFFFFFF
ESI:00000100
EDI:00000000
CS:EIP:001B:7C809C0B
SS:ESP:0023:00032FE4 EBP:00033008
DS:0023 ES:0023 FS:003B GS:0000
Flags:00010206