Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job is getting rerun instead of terminated

Date: Wed, 27 Jul 2005 08:32:17 +0200 (CEST)
From: Andreas Vetter <andreas.vetter@xxxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [Condor-users] Job is getting rerun instead of terminated

Thank you for the explanation. I didn't realize that.

Jobs running for more than 12 hours are to be thrown out of the queue. The 
users will not add a job ad like that, because they just forget. How could 
I do this with config files? Is there something like "defaults for 
submitting jobs", that I could change?

On Fri, 22 Jul 2005, Jaime Frey wrote:

> On Jul 22, 2005, at 5:26 AM, Andreas Vetter wrote:
> 
> > we have a setup that is meant to termminate all jobs after 12 hours
> > runtime. Most jobs are vanilla universe. But sometimes there are jobs
> > that
> > are evicted after 12 hours and then started again on other nodes. The
> > user
> > finally killed the job with condor_rm. Other jobs are terminated after 12
> > hours as expected.
> > 
> > Attached is part 3 of our global condor config and the users log for the
> > restarting job.
> > 
> > Did I miss something?
> 
> When an execute machine kills a job for running too long, the schedd doesn't
> consider the job complete. It thinks that the execute machine wasn't willing
> to let the job run long enough and it now needs to find another machine that
> will let the job run to completion. When a job leaves the queue is controlled
> by the job ad in the schedd.
> 
> If you want your jobs to leave the queue when they run longer than 12 hours,
> you need to set periodic_remove in the job ads. If you want the jobs to stay
> in the queue but not get rerun, you need to modify the startd's requirements
> to not run jobs that previously ran for more than 12 hours.
> 
> +----------------------------------+---------------------------------+
> |    Jaime Frey            |  Public Split on Whether        |
> |      jfrey@xxxxxxxxxxx         |  Bush Is a Divider              |
> | http://www.cs.wisc.edu/~jfrey/  |         -- CNN Scrolling Banner |
> +----------------------------------+---------------------------------+
> 
> 
> 
> 

-- 
 Andreas Vetter

References:
- [Condor-users] Job is getting rerun instead of terminated
  - From: Andreas Vetter
- Re: [Condor-users] Job is getting rerun instead of terminated
  - From: Jaime Frey

Prev by Date: [Condor-users] LoadAverage on Windows machines
Next by Date: Re: [Condor-users] condor_config_val doesn't set/retreive customv aluesproperly
Previous by thread: Re: [Condor-users] Job is getting rerun instead of terminated
Next by thread: RE: [Condor-users] Job is getting rerun instead of terminated
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [Condor-users] Job is getting rerun instead of terminated