Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Job is getting rerun instead of terminated
Thank you for the explanation. I didn't realize that.
Jobs running for more than 12 hours are to be thrown out of the queue. The
users will not add a job ad like that, because they just forget. How could
I do this with config files? Is there something like "defaults for
submitting jobs", that I could change?
On Fri, 22 Jul 2005, Jaime Frey wrote:
> On Jul 22, 2005, at 5:26 AM, Andreas Vetter wrote:
>
> > we have a setup that is meant to termminate all jobs after 12 hours
> > runtime. Most jobs are vanilla universe. But sometimes there are jobs
> > that
> > are evicted after 12 hours and then started again on other nodes. The
> > user
> > finally killed the job with condor_rm. Other jobs are terminated after 12
> > hours as expected.
> >
> > Attached is part 3 of our global condor config and the users log for the
> > restarting job.
> >
> > Did I miss something?
>
> When an execute machine kills a job for running too long, the schedd doesn't
> consider the job complete. It thinks that the execute machine wasn't willing
> to let the job run long enough and it now needs to find another machine that
> will let the job run to completion. When a job leaves the queue is controlled
> by the job ad in the schedd.
>
> If you want your jobs to leave the queue when they run longer than 12 hours,
> you need to set periodic_remove in the job ads. If you want the jobs to stay
> in the queue but not get rerun, you need to modify the startd's requirements
> to not run jobs that previously ran for more than 12 hours.
>
> +----------------------------------+---------------------------------+
> | Jaime Frey | Public Split on Whether |
> | jfrey@xxxxxxxxxxx | Bush Is a Divider |
> | http://www.cs.wisc.edu/~jfrey/ | -- CNN Scrolling Banner |
> +----------------------------------+---------------------------------+
>
>
>
>
--
Andreas Vetter