Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] killed jobs hang around in idle state
- Date: Tue, 22 Jun 2004 13:15:32 +0100
- From: "Dr Ian C. Smith" <i.c.smith@xxxxxxxxxxxxxxx>
- Subject: [Condor-users] killed jobs hang around in idle state
Hi
I'm having problems trying to kill jobs at a certain
time when using Condor 6.6.5 on Win2K. When the job
is killed it continues to hang around in the idle
state indefinitely:
C:\Condor\ics>condor_q -analyze
-- Submitter: 102153-71130c.liv.ac.uk : <138.253.102.153:1042> :
102153-71130c.l
iv.ac.uk
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
---
187.000: Run analysis summary. Of 2 machines,
1 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match, but are serving users with a better priority in the pool
1 match, but prefer another specific job despite its worse
user-priority
0 match, but will not currently preempt their existing job
0 are available to run your job
Last successful match: Tue Jun 22 13:05:31 2004
1 jobs; 1 idle, 0 running, 0 held
The config file looks like:
WANT_SUSPEND = FALSE
WANT_VACATE = TRUE
START = TRUE
SUSPEND = ClockMin > 660
CONTINUE = FALSE
PREEMPT = TRUE
KILL = TRUE
Something seems to be wrong judging by SchedLog:
6/22 13:05:57 DaemonCore: Command received via TCP from host
<138.253.102.153:1365>
6/22 13:05:57 DaemonCore: received command 443 (VACATE_SERVICE), calling
handler (vacate_service)
6/22 13:05:57 Got VACATE_SERVICE from <138.253.102.153:1365>
6/22 13:05:57 Sent RELEASE_CLAIM to startd on <138.253.102.153:1041>
6/22 13:05:57 Match record (<138.253.102.153:1041>, 187, 0) deleted
6/22 13:05:57 DaemonCore: Command received via UDP from host
<138.253.102.153:1367>
6/22 13:05:57 DaemonCore: received command 60001 (DC_PROCESSEXIT), calling
handler (HandleProcessExitCommand())
6/22 13:05:57 Scheduler::Relinquish - mrec is NULL, can't relinquish
6/22 13:05:57 Null parameter --- match not deleted
6/22 13:06:04 DaemonCore: Command received via UDP from host
<138.253.102.153:1371>
any ideas ?
thanks in advance
-ian.