[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Idle, stuck Condor-G jobs
- Date: Sat, 27 Oct 2007 16:22:30 +0200
- From: Jan Ploski <Jan.Ploski@xxxxxxxx>
- Subject: [Condor-users] Idle, stuck Condor-G jobs
Hello,
I have a few Condor-G jobs which were originally held due to an
(unexplained) error during stage out. I then released these jobs
manually, in hope that they will be resubmitted. Now they are reported
as idle by condor_q, with status STAGE_IN. I can see lots of messages in
the Globus container.log mentioning
org.globus.gsi.proxy.ProxyPathValidatorException - my proxy certificate
has apparently expired, and I guess that these messages are caused by
Condor-G trying to submit these jobs.
Shouldn't Condor hold the jobs and report the expired proxy certificate
as the hold reason in such a situation?
Also, what puzzles me is that when I released the jobs from the original
hold, the proxy certificate for certainly valid - the expired messages
didn't start appearing before some 7 hours after the job release, which
should have been more than enough time for the jobs to be successfully
resubmitted, executed and possibly held again (these jobs are the only
jobs in the queue, there is no other load, etc). Is my assumption that
the released Condor-G jobs are resubmitted on the next scheduling cycle
incorrect?
Regards,
Jan Ploski