[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] [resubmit jobs]



On 7/1/2014 11:03 AM, Sunshine wrote:
I submit some jobs.
A few of jobs took 2 hours to complete, but I think the time should be 20m and some similar jobs indeed finished within 20minutes.
I think something wrong with my jobs or clusters..


My question: how do I let a job restart after a specific time?
For example, if a job didn't finish within 5 minutes, then let the job resubmit?ãor restart on a different machine?


For example, i submit 100 jobs, then 99 jobs finished within 20m, but a job cost 2hours long , i want to resubmit a job.


I used following:
periodic_remove = (CurrentTime - EnteredCurrentStatus >60*20)âââ
then check the log, then submit the failed job.


Any better ideas?


Yes.  See

https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToAutoRetryElsewhere


regards,
Todd