Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] some held/released jobs never execute
- Date: Fri, 16 Jun 2006 13:07:31 +0200
- From: Horvátth Szabolcs <szabolcs@xxxxxxxxxxxxx>
- Subject: [Condor-users] some held/released jobs never execute
Hi,
With Condor 6.7.19 on windows XP I have a strange problem. Sometimes
when I'd like to vacate a job from a machine
the condor_vacate_job command does not work, and I have to hold and
release the job to re-negotiate it.
But after going back into the idle state the job never gets executed again.
This is what I found in the shadow log for such a job:
6/16 12:35:56 (68770.0) (472): Got SIGTERM. Performing graceful shutdown.
6/16 12:35:57 (68770.0) (472): getpeername failed so connect must have
failed
6/16 12:36:16 (68770.0) (472): Connect failed for 20 seconds; returning
FALSE
6/16 12:36:16 (68770.0) (472): RemoteResource::killStarter(): Could not
send command to startd
And this is the last item for it in the scheduler log:
6/16 12:11:36 Starting add_shadow_birthdate(68770.0)
6/16 12:11:36 Started shadow for job 68770.0 on "<192.168.0.101:1040>",
(shadow pid = 472)
What is going on behind the scenes and what can I do to force the
execution of this job?
Cheers,
Szabolcs