Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] [Birdbath Related] Can't remove Jobs and Clusters
- Date: Sun, 26 Mar 2006 21:15:50 +0100
- From: Afrasyab Bashir <afrasyab@xxxxxxxxx>
- Subject: [Condor-users] [Birdbath Related] Can't remove Jobs and Clusters
Hi Matt,
I've accumulated many jobs with JobStatus = 1 in the queue. Now I'm trying
to kill all these jobs (or just remove them) using removeJob and /or
removeCluster functions without success. condor.Status returned is a null
object all the times. Could you please have a look at the log to advise? . A
few things that I have noticed in the log are mentioned below. Sorry if you
find it very basic but I can't understand this :(
a) ProcAPI sanity failure
b) command 60011 (DC_NOP), calling handler (handle_nop())
c) VACATE_SERVICE
c) RELEASE_CLAIM
Cheers
Afras
Log
3/26 20:53:00 (pid:4920) ProcAPI sanity failure, user_time = -167
3/26 20:53:00 (pid:4920) ProcAPI sanity failure, age = -97025303
3/26 20:54:18 (pid:4920) Activity on stashed negotiator socket
3/26 20:54:18 (pid:4920) Negotiating for owner: s2vp@afrasyab-LAPTOP
3/26 20:54:18 (pid:4920) Checking consistency running and runnable jobs
3/26 20:54:18 (pid:4920) Tables are consistent
3/26 20:54:18 (pid:4920) Out of servers - 1 jobs matched, 1 jobs idle, 1
jobs rejected
3/26 20:54:18 (pid:4920) Activity on stashed negotiator socket
3/26 20:54:18 (pid:4920) Negotiating for owner: S2VP@afrasyab-LAPTOP
3/26 20:54:18 (pid:4920) Checking consistency running and runnable jobs
3/26 20:54:18 (pid:4920) Tables are consistent
3/26 20:54:18 (pid:4920) Out of servers - 1 jobs matched, 104 jobs idle, 3
jobs rejected
3/26 20:54:22 (pid:4920) Starting add_shadow_birthdate(25.0)
3/26 20:54:22 (pid:4920) Started shadow for job 25.0 on
"<192.168.1.3:2472>", (shadow pid = 124)
3/26 20:54:23 (pid:4920) Sent ad to central manager for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP
3/26 20:54:23 (pid:4920) Sent ad to 1 collectors for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP
3/26 20:54:23 (pid:4920) Sent ad to central manager for S2VP@afrasyab-LAPTOP
3/26 20:54:23 (pid:4920) Sent ad to 1 collectors for S2VP@afrasyab-LAPTOP
3/26 20:54:23 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP
3/26 20:54:23 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab-LAPTOP
3/26 20:54:24 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3283>
3/26 20:54:24 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
3/26 20:54:24 (pid:4920) Shadow pid 124 for job 25.0 exited with status 4
3/26 20:54:24 (pid:4920) ERROR: Shadow exited with job exception code!
3/26 20:54:26 (pid:4920) Starting add_shadow_birthdate(13.0)
3/26 20:54:26 (pid:4920) Started shadow for job 13.0 on
"<192.168.1.2:2115>", (shadow pid = 3884)
3/26 20:54:26 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3285>
3/26 20:54:26 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
3/26 20:54:26 (pid:4920) Shadow pid 3884 for job 13.0 exited with status 4
3/26 20:54:26 (pid:4920) ERROR: Shadow exited with job exception code!
3/26 20:54:28 (pid:4920) Starting add_shadow_birthdate(25.0)
3/26 20:54:28 (pid:4920) Started shadow for job 25.0 on
"<192.168.1.3:2472>", (shadow pid = 3068)
3/26 20:54:28 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3287>
3/26 20:54:28 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
3/26 20:54:28 (pid:4920) Shadow pid 3068 for job 25.0 exited with status 4
3/26 20:54:28 (pid:4920) ERROR: Shadow exited with job exception code!
3/26 20:54:30 (pid:4920) Starting add_shadow_birthdate(13.0)
3/26 20:54:31 (pid:4920) Started shadow for job 13.0 on
"<192.168.1.2:2115>", (shadow pid = 1120)
3/26 20:54:31 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3289>
3/26 20:54:31 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
3/26 20:54:31 (pid:4920) Shadow pid 1120 for job 13.0 exited with status 4
3/26 20:54:31 (pid:4920) ERROR: Shadow exited with job exception code!
3/26 20:54:33 (pid:4920) Starting add_shadow_birthdate(25.0)
3/26 20:54:34 (pid:4920) Started shadow for job 25.0 on
"<192.168.1.3:2472>", (shadow pid = 5428)
3/26 20:54:34 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3291>
3/26 20:54:34 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
3/26 20:54:34 (pid:4920) Shadow pid 5428 for job 25.0 exited with status 4
3/26 20:54:34 (pid:4920) ERROR: Shadow exited with job exception code!
3/26 20:54:36 (pid:4920) Starting add_shadow_birthdate(13.0)
3/26 20:54:36 (pid:4920) Started shadow for job 13.0 on
"<192.168.1.2:2115>", (shadow pid = 4396)
3/26 20:54:37 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3296>
3/26 20:54:37 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
3/26 20:54:37 (pid:4920) Shadow pid 4396 for job 13.0 exited with status 4
3/26 20:54:37 (pid:4920) ERROR: Shadow exited with job exception code!
3/26 20:54:38 (pid:4920) Starting add_shadow_birthdate(25.0)
3/26 20:54:38 (pid:4920) Started shadow for job 25.0 on
"<192.168.1.3:2472>", (shadow pid = 5400)
3/26 20:54:39 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3298>
3/26 20:54:39 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
3/26 20:54:39 (pid:4920) Shadow pid 5400 for job 25.0 exited with status 4
3/26 20:54:39 (pid:4920) ERROR: Shadow exited with job exception code!
3/26 20:54:40 (pid:4920) Starting add_shadow_birthdate(13.0)
3/26 20:54:40 (pid:4920) Started shadow for job 13.0 on
"<192.168.1.2:2115>", (shadow pid = 4660)
3/26 20:54:41 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3300>
3/26 20:54:41 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
3/26 20:54:41 (pid:4920) Shadow pid 4660 for job 13.0 exited with status 4
3/26 20:54:41 (pid:4920) ERROR: Shadow exited with job exception code!
3/26 20:54:42 (pid:4920) Starting add_shadow_birthdate(25.0)
3/26 20:54:42 (pid:4920) Started shadow for job 25.0 on
"<192.168.1.3:2472>", (shadow pid = 4992)
3/26 20:54:43 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3302>
3/26 20:54:43 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
3/26 20:54:43 (pid:4920) Shadow pid 4992 for job 25.0 exited with status 4
3/26 20:54:43 (pid:4920) ERROR: Shadow exited with job exception code!
3/26 20:54:43 (pid:4920) Match for cluster 25 has had 5 shadow exceptions,
relinquishing.
3/26 20:54:43 (pid:4920) Sent RELEASE_CLAIM to startd on <192.168.1.3:2472>
3/26 20:54:43 (pid:4920) Match record (<192.168.1.3:2472>, 25, 0) deleted
3/26 20:54:43 (pid:4920) DaemonCore: Command received via TCP from host
<192.168.1.3:3305>
3/26 20:54:43 (pid:4920) DaemonCore: received command 443 (VACATE_SERVICE),
calling handler (vacate_service)
3/26 20:54:43 (pid:4920) Got VACATE_SERVICE from <192.168.1.3:3305>
3/26 20:54:44 (pid:4920) Starting add_shadow_birthdate(13.0)
3/26 20:54:44 (pid:4920) Started shadow for job 13.0 on
"<192.168.1.2:2115>", (shadow pid = 3924)
3/26 20:54:44 (pid:4920) Sent ad to central manager for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP
3/26 20:54:44 (pid:4920) Sent ad to 1 collectors for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP
3/26 20:54:44 (pid:4920) Sent ad to central manager for S2VP@afrasyab-LAPTOP
3/26 20:54:44 (pid:4920) Sent ad to 1 collectors for S2VP@afrasyab-LAPTOP
3/26 20:54:44 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP
3/26 20:54:44 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab-LAPTOP
3/26 20:54:45 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3314>
3/26 20:54:45 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
3/26 20:54:45 (pid:4920) Shadow pid 3924 for job 13.0 exited with status 4
3/26 20:54:45 (pid:4920) ERROR: Shadow exited with job exception code!
3/26 20:54:45 (pid:4920) Match for cluster 13 has had 5 shadow exceptions,
relinquishing.
3/26 20:54:45 (pid:4920) Sent RELEASE_CLAIM to startd on <192.168.1.2:2115>
3/26 20:54:45 (pid:4920) Match record (<192.168.1.2:2115>, 13, 0) deleted
3/26 20:54:49 (pid:4920) DaemonCore: Command received via TCP from host
<192.168.1.2:3748>
3/26 20:54:49 (pid:4920) DaemonCore: received command 443 (VACATE_SERVICE),
calling handler (vacate_service)
3/26 20:54:49 (pid:4920) Got VACATE_SERVICE from <192.168.1.2:3748>
3/26 20:57:00 (pid:4920) ProcAPI sanity failure, user_time = -165
3/26 20:57:01 (pid:4920) ProcAPI sanity failure, age = -97025063
3/26 20:58:43 (pid:4920) Received HTTP POST connection from
<192.168.1.3:3345>
3/26 20:58:43 (pid:4920) About to serve HTTP request...
3/26 20:58:44 (pid:4920) Completed servicing HTTP request
3/26 20:59:19 (pid:4920) Activity on stashed negotiator socket
3/26 20:59:19 (pid:4920) Negotiating for owner:
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP
3/26 20:59:19 (pid:4920) Checking consistency running and runnable jobs
3/26 20:59:19 (pid:4920) Tables are consistent
3/26 20:59:19 (pid:4920) Out of jobs - 1 jobs matched, 0 jobs idle, flock
level = 0
3/26 20:59:19 (pid:4920) Sent ad to central manager for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP
3/26 20:59:19 (pid:4920) Sent ad to 1 collectors for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP
3/26 20:59:19 (pid:4920) Sent ad to central manager for S2VP@afrasyab-LAPTOP
3/26 20:59:19 (pid:4920) Sent ad to 1 collectors for S2VP@afrasyab-LAPTOP
3/26 20:59:19 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP
3/26 20:59:19 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab-LAPTOP
3/26 20:59:19 (pid:4920) Activity on stashed negotiator socket
3/26 20:59:19 (pid:4920) Negotiating for owner: s2vp@afrasyab-LAPTOP
3/26 20:59:19 (pid:4920) Checking consistency running and runnable jobs
3/26 20:59:19 (pid:4920) Tables are consistent
3/26 20:59:19 (pid:4920) Out of servers - 0 jobs matched, 2 jobs idle, 1
jobs rejected
3/26 20:59:19 (pid:4920) Activity on stashed negotiator socket
3/26 20:59:19 (pid:4920) Negotiating for owner: S2VP@afrasyab-LAPTOP
3/26 20:59:19 (pid:4920) Checking consistency running and runnable jobs
3/26 20:59:19 (pid:4920) Tables are consistent
3/26 20:59:19 (pid:4920) Out of servers - 1 jobs matched, 104 jobs idle, 3
jobs rejected
3/26 20:59:23 (pid:4920) Starting add_shadow_birthdate(25.0)
3/26 20:59:24 (pid:4920) Started shadow for job 25.0 on
"<192.168.1.3:2472>", (shadow pid = 5908)
3/26 20:59:24 (pid:4920) Sent ad to central manager for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP
3/26 20:59:24 (pid:4920) Sent ad to 1 collectors for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP
3/26 20:59:24 (pid:4920) Sent ad to central manager for S2VP@afrasyab-LAPTOP
3/26 20:59:24 (pid:4920) Sent ad to 1 collectors for S2VP@afrasyab-LAPTOP
3/26 20:59:24 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP
3/26 20:59:24 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab-LAPTOP
3/26 20:59:25 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3367>
3/26 20:59:25 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
3/26 20:59:25 (pid:4920) Shadow pid 5908 for job 25.0 exited with status 4
3/26 20:59:25 (pid:4920) ERROR: Shadow exited with job exception code!
3/26 20:59:27 (pid:4920) Starting add_shadow_birthdate(12.0)
3/26 20:59:27 (pid:4920) Started shadow for job 12.0 on
"<192.168.1.2:2115>", (shadow pid = 5372)
3/26 20:59:27 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3369>
3/26 20:59:27 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
3/26 20:59:27 (pid:4920) Shadow pid 5372 for job 12.0 exited with status 4
3/26 20:59:27 (pid:4920) ERROR: Shadow exited with job exception code!
3/26 20:59:29 (pid:4920) Starting add_shadow_birthdate(25.0)
3/26 20:59:30 (pid:4920) Started shadow for job 25.0 on
"<192.168.1.3:2472>", (shadow pid = 4972)
3/26 20:59:30 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3371>
3/26 20:59:30 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
3/26 20:59:30 (pid:4920) Shadow pid 4972 for job 25.0 exited with status 4
3/26 20:59:30 (pid:4920) ERROR: Shadow exited with job exception code!
3/26 20:59:32 (pid:4920) Starting add_shadow_birthdate(12.0)
3/26 20:59:32 (pid:4920) Started shadow for job 12.0 on
"<192.168.1.2:2115>", (shadow pid = 3224)
3/26 20:59:33 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3373>
3/26 20:59:33 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
3/26 20:59:33 (pid:4920) Shadow pid 3224 for job 12.0 exited with status 4
3/26 20:59:33 (pid:4920) ERROR: Shadow exited with job exception code!
3/26 20:59:34 (pid:4920) Starting add_shadow_birthdate(25.0)
3/26 20:59:34 (pid:4920) Started shadow for job 25.0 on
"<192.168.1.3:2472>", (shadow pid = 3000)
3/26 20:59:34 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3375>
3/26 20:59:34 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
3/26 20:59:34 (pid:4920) Shadow pid 3000 for job 25.0 exited with status 4
3/26 20:59:34 (pid:4920) ERROR: Shadow exited with job exception code!
3/26 20:59:36 (pid:4920) Starting add_shadow_birthdate(12.0)
3/26 20:59:36 (pid:4920) Started shadow for job 12.0 on
"<192.168.1.2:2115>", (shadow pid = 5884)
3/26 20:59:37 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3380>
3/26 20:59:37 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
3/26 20:59:37 (pid:4920) Shadow pid 5884 for job 12.0 exited with status 4
3/26 20:59:37 (pid:4920) ERROR: Shadow exited with job exception code!
3/26 20:59:38 (pid:4920) Starting add_shadow_birthdate(25.0)
3/26 20:59:38 (pid:4920) Started shadow for job 25.0 on
"<192.168.1.3:2472>", (shadow pid = 652)
3/26 20:59:38 (pid:4920) Received HTTP POST connection from
<192.168.1.3:3382>
3/26 20:59:38 (pid:4920) About to serve HTTP request...
3/26 20:59:39 (pid:4920) Completed servicing HTTP request
3/26 20:59:40 (pid:4920) Starting add_shadow_birthdate(12.0)
3/26 20:59:40 (pid:4920) Started shadow for job 12.0 on
"<192.168.1.2:2115>", (shadow pid = 4500)
3/26 20:59:40 (pid:4920) Sent ad to central manager for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP
3/26 20:59:40 (pid:4920) Sent ad to 1 collectors for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP
3/26 20:59:40 (pid:4920) Sent ad to central manager for S2VP@afrasyab-LAPTOP
3/26 20:59:40 (pid:4920) Sent ad to 1 collectors for S2VP@afrasyab-LAPTOP
3/26 20:59:40 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP
3/26 20:59:40 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab-LAPTOP
3/26 20:59:40 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3388>
3/26 20:59:40 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
3/26 20:59:40 (pid:4920) Shadow pid 652 for job 25.0 exited with status 4
3/26 20:59:40 (pid:4920) ERROR: Shadow exited with job exception code!
3/26 20:59:41 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3389>
3/26 20:59:41 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
3/26 20:59:41 (pid:4920) Shadow pid 4500 for job 12.0 exited with status 4
3/26 20:59:41 (pid:4920) ERROR: Shadow exited with job exception code!
3/26 20:59:43 (pid:4920) Starting add_shadow_birthdate(25.0)
3/26 20:59:43 (pid:4920) Started shadow for job 25.0 on
"<192.168.1.3:2472>", (shadow pid = 4300)
3/26 20:59:43 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3391>
3/26 20:59:43 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
3/26 20:59:43 (pid:4920) Shadow pid 4300 for job 25.0 exited with status 4
3/26 20:59:43 (pid:4920) ERROR: Shadow exited with job exception code!
3/26 20:59:43 (pid:4920) Match for cluster 25 has had 5 shadow exceptions,
relinquishing.
3/26 20:59:43 (pid:4920) Sent RELEASE_CLAIM to startd on <192.168.1.3:2472>
3/26 20:59:43 (pid:4920) Match record (<192.168.1.3:2472>, 25, 0) deleted
3/26 20:59:43 (pid:4920) DaemonCore: Command received via TCP from host
<192.168.1.3:3394>
3/26 20:59:43 (pid:4920) DaemonCore: received command 443 (VACATE_SERVICE),
calling handler (vacate_service)
3/26 20:59:43 (pid:4920) Got VACATE_SERVICE from <192.168.1.3:3394>
3/26 20:59:45 (pid:4920) Starting add_shadow_birthdate(12.0)
3/26 20:59:46 (pid:4920) Started shadow for job 12.0 on
"<192.168.1.2:2115>", (shadow pid = 4992)
3/26 20:59:46 (pid:4920) Sent ad to central manager for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP
3/26 20:59:46 (pid:4920) Sent ad to 1 collectors for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP
3/26 20:59:46 (pid:4920) Sent ad to central manager for S2VP@afrasyab-LAPTOP
3/26 20:59:46 (pid:4920) Sent ad to 1 collectors for S2VP@afrasyab-LAPTOP
3/26 20:59:46 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP
3/26 20:59:46 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab-LAPTOP
3/26 20:59:46 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3402>
3/26 20:59:46 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
3/26 20:59:46 (pid:4920) Shadow pid 4992 for job 12.0 exited with status 4
3/26 20:59:46 (pid:4920) ERROR: Shadow exited with job exception code!
3/26 20:59:46 (pid:4920) Match for cluster 12 has had 5 shadow exceptions,
relinquishing.
3/26 20:59:46 (pid:4920) Sent RELEASE_CLAIM to startd on <192.168.1.2:2115>
3/26 20:59:46 (pid:4920) Match record (<192.168.1.2:2115>, 12, 0) deleted
3/26 20:59:51 (pid:4920) DaemonCore: Command received via TCP from host
<192.168.1.2:3761>
3/26 20:59:51 (pid:4920) DaemonCore: received command 443 (VACATE_SERVICE),
calling handler (vacate_service)
3/26 20:59:51 (pid:4920) Got VACATE_SERVICE from <192.168.1.2:3761>
3/26 21:01:01 (pid:4920) ProcAPI sanity failure, user_time = -164
3/26 21:01:01 (pid:4920) ProcAPI sanity failure, age = -97024822