Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Jobs that should be suspended are evicted, and STARTD crashes
- Date: Mon, 24 Oct 2005 14:36:07 -0700
- From: "Finch, Ralph" <rfinch@xxxxxxxxxxxx>
- Subject: [Condor-users] Jobs that should be suspended are evicted, and STARTD crashes
condor -version
$CondorVersion: 6.7.12 Sep 24 2005 $
$CondorPlatform: INTEL-WINNT50 $
I'm trying to suspend a job using a simple keyboard idle test.
I have PREEMPT = FALSE in the condor_config file. Instead of
being suspended when I touch the keyboard, however, the job
(and another job in the cluster on the same SMP but different VM)
are evicted. I notice this bug was fixed recently and wonder if
it's still lingering:
- Fixed a bug that would cause the condor startd to crash under certain
conditions during
job eviction. This bug was introduced in Condor version 6.6.6.
Pool Manager MasterLog:
10/24 13:43:51 DaemonCore: Command received via UDP from host
<136.200.32.102:2979>
10/24 13:43:51 DaemonCore: received command 60011 (DC_NOP), calling
handler (handle_nop())
10/24 13:43:51 The STARTD (pid 2572) exited with status 4
10/24 13:43:56 Procfamily: ERROR: Could not open pid 2932 (err=87).
Maybe it exited already?
10/24 13:44:04 Sending obituary for "Z:\Condor/bin/condor_startd.exe"
10/24 13:44:04 restarting Z:\Condor/bin/condor_startd.exe in 10 seconds
10/24 13:44:14 Started DaemonCore process
"Z:\Condor/bin/condor_startd.exe", pid and pgroup = 632
The job log:
...
007 (448.001.000) 10/24 13:44:30 Shadow exception!
Can no longer talk to condor_starter <136.200.32.102:2086>
0 - Run Bytes Sent By Job
113527448 - Run Bytes Received By Job
...
007 (448.000.000) 10/24 13:44:34 Shadow exception!
Can no longer talk to condor_starter <136.200.32.102:2086>
0 - Run Bytes Sent By Job
113527448 - Run Bytes Received By Job
...
Ralph Finch, P.E.
Dept. of Water Resources
Bay-Delta Office, Room 215-13
Sacramento, CA 95814
916-653-7552
rfinch@xxxxxxxxxxxx