[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Idle, stuck Condor-G jobs



Hello,

I have a few Condor-G jobs which were originally held due to an (unexplained) error during stage out. I then released these jobs manually, in hope that they will be resubmitted. Now they are reported as idle by condor_q, with status STAGE_IN. I can see lots of messages in the Globus container.log mentioning org.globus.gsi.proxy.ProxyPathValidatorException - my proxy certificate has apparently expired, and I guess that these messages are caused by Condor-G trying to submit these jobs.
Shouldn't Condor hold the jobs and report the expired proxy certificate 
as the hold reason in such a situation?
Also, what puzzles me is that when I released the jobs from the original 
hold, the proxy certificate for certainly valid - the expired messages 
didn't start appearing before some 7 hours after the job release, which 
should have been more than enough time for the jobs to be successfully 
resubmitted, executed and possibly held again (these jobs are the only 
jobs in the queue, there is no other load, etc). Is my assumption that 
the released Condor-G jobs are resubmitted on the next scheduling cycle 
incorrect?
Regards,
Jan Ploski