Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Jobs stuck in the removing (JobStatus == 3) state
- Date: Fri, 25 May 2007 15:40:09 -0400
- From: "Ian Chesal" <ICHESAL@xxxxxxxxxx>
- Subject: [Condor-users] Jobs stuck in the removing (JobStatus == 3) state
I've got a user who some how managed get a handful of jobs stuck in the
JobStatus == 3 (X) state. No amount of condor_rm'ing has been able to
get these things out of the queue. We're running Quill and I think the
problem may be just that Quill is hung up and not updating the status,
but I can't get -direct schedd on condor_q to work so I can't verify
this.
Here are the jobs as Quill reports them:
/ttcbatch> /opt/condor/bin/condor_q -const "JobStatus == 3" -direct
quilld
-- Submitter: quill-sj-schedd1.altera.com@xxxxxxxxxxxxxxxxxxxxx :
<137.57.202.107:40428> : sj-schedd1.altera.com
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
85728.0 pkazaria 5/21 16:49 0+00:00:00 X 0 1074.2
wrapper.pl /experi
85728.33 pkazaria 5/21 16:49 0+00:00:00 X 0 898.4
wrapper.pl /experi
85728.35 pkazaria 5/21 16:49 0+00:00:00 X 0 878.9
wrapper.pl /experi
85728.50 pkazaria 5/21 16:49 0+00:00:01 X 0 58.6 wrapper.pl
/experi
91944.63 pkazaria 5/24 09:10 0+00:02:07 X 0 712.9
wrapper.bat /exper
And if I try to get this info straight from the schedd I get:
/ttcbatch> /opt/condor/bin/condor_q -const "JobStatus == 3" -direct
schedd
-- Failed to fetch ads from: <137.57.202.107:52744> :
sj-schedd1.altera.com
And condor_rm says:
/ttcbatch> /opt/condor/bin/condor_rm -const "JobStatus == 3"
AUTHENTICATE:1002:Failure performing handshake
Couldn't find/remove all jobs matching constraint (JobStatus == 3)
And my ScheddLog is littered with:
/ttcbatch> tail /build/condor/log/SchedLog
5/25 12:36:55 OwnerCheck(root) failed in SetAttribute for job 85728.0
5/25 12:36:55 OwnerCheck(root) failed in SetAttribute for job 85728.0
5/25 12:36:55 OwnerCheck(root) failed in SetAttribute for job 85728.0
5/25 12:36:55 OwnerCheck(root) failed in SetAttribute for job 85728.0
5/25 12:36:55 OwnerCheck(root) failed in SetAttribute for job 85728.0
5/25 12:36:55 OwnerCheck(root) failed in SetAttribute for job 85728.0
5/25 12:36:55 OwnerCheck(root) failed in SetAttribute for job 85728.0
5/25 12:36:55 OwnerCheck(root) failed in SetAttribute for job 85728.0
5/25 12:36:55 OwnerCheck(root) failed in SetAttribute for job 85728.0
5/25 12:36:55 OwnerCheck(root) failed in SetAttribute for job 85728.0
I'd rather not restart my schedd. Is there a way to clear out this
problem that might not require a schedd reboot?
- Ian
--
Ian R. Chesal <ichesal@xxxxxxxxxx>
Senior Software Engineer
Altera Corporation
Toronto Technology Center
Tel: (416) 926-8300