[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Idle, stuck Condor-G jobs



Hello,

I have a few Condor-G jobs which were originally held due to an (unexplained) error during stage out. I then released these jobs manually, in hope that they will be resubmitted. Now they are reported as idle by condor_q, with status STAGE_IN. I can see lots of messages in the Globus container.log mentioning org.globus.gsi.proxy.ProxyPathValidatorException - my proxy certificate has apparently expired, and I guess that these messages are caused by Condor-G trying to submit these jobs.

Shouldn't Condor hold the jobs and report the expired proxy certificate as the hold reason in such a situation?

Also, what puzzles me is that when I released the jobs from the original hold, the proxy certificate for certainly valid - the expired messages didn't start appearing before some 7 hours after the job release, which should have been more than enough time for the jobs to be successfully resubmitted, executed and possibly held again (these jobs are the only jobs in the queue, there is no other load, etc). Is my assumption that the released Condor-G jobs are resubmitted on the next scheduling cycle incorrect?

Regards,
Jan Ploski