Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Specifying Max Job Run Time
- Date: Mon, 23 Oct 2006 14:21:29 -0500
- From: Burt Holzman <burt@xxxxxxxx>
- Subject: Re: [Condor-users] Specifying Max Job Run Time
Terrence Martin wrote:
I do not suppose there is a recipe out there for restricting how long a
job runs while at the same time making sure that jobs are not
interrupted prior to that run time and for the long run jobs only to be
kicked if there are other jobs waiting for the queue? Say a value of 72
hours, that is both the min and max runtime for jobs but if the jobs are
the only one on the cluster that they can just keep running.
There are a lot of settings it seems, from PREEMPTION_REQUIREMENTS to
MaxJobRetirementTime to PREEMPT_LATENCY. It is just not all that clear
to me how to go about getting all these settings to do what I want as
far as putting an upper limit on jobs after which they can not be
guaranteed to run, while at the same time not kicking off jobs that may
be running long for legitimate reasons on an otherwise underused cluster.
Terrence,
It sounds like you just need jobs to preempt quickly but with a long
retirement time. We have this at the CMS Tier 1 at FNAL.
MaxJobRetirementTime is set to 48 hours. PREEMPTION_REQUIREMENTS is
essentially set to
(CurrentTime - EnteredCurrentState) > (10*60)
In this configuration, jobs get preempted after 10 minutes of running if
jobs are waiting in the queue; but the preemption does not evict the
running jobs until it hits 48 hours.
- B