Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Jobs running forever...
On Wed, 19 Jul 2006 13:26:16 +0200
ucarlino@xxxxxxxxxx wrote:
> Hello,
> sometime it happens that a jobs stays running without terminating and
> the only thing that can be done is to kill them with 'condor__rm'.
>
> It is possible to avoid this placing, in the submit file, the directive:
> # Limit runtime to 30 minutes (30*60=1800 seconds)
> #
> maxRunTime = 1800
> #
> # Limit total time in queue to 12 hours (60*60*12=43200 seconds)
> #
> maxQueueTime = 43200
> #
> # Remove jobs exceeding maxRunTime or maxQueueTime
> #
> periodic_remove = (RemoteWallClockTime > $(maxRunTime) || ((QDate -
> CurrentTime) > $(maxQueueTime))
>
>
> I was wondering if this configuration could be defined at pool level,
> avoiding the need to put it in every submit file.
Hello,
I was also wondering that, and I found a solution for that (not that it is very
elegant, but it works...): I've defined in the local configuration file of my
schedd host the following two lines:
PeriodicRemove=(RemoteWallClockTime > 10)
SUBMIT_EXPRS = PeriodicRemove
This way, the attribute PeriodicRemove (which is the classad counterpart of the
keyword periodic_remove in the submit file) gets appended to the job classad...
According to the documentation, this will not prevent users to override this
attribute, by putting in the submit file a line containing +PeriodicRemove=False,
but it will at least provide a default value...
Hope this helps...
Pascal