Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Troubleshooting job eviction by machine RANK
- Date: Wed, 03 Feb 2016 17:09:41 -0600
- From: Graham Allan <allan@xxxxxxxxxxxxxxx>
- Subject: [HTCondor-users] Troubleshooting job eviction by machine RANK
We've been running htcondor here for many years but with a fairly static
configuration, in which jobs have only run on the machines owned by the
research group.
We did this simply by defining a variable "CondorGroup" which is both
defined per group, and used in the START expression, something like:
CondorGroup = "novafarm"
SUBMIT_EXPRS = CondorGroup, $(SUBMIT_EXPRS)
START = (CondorGroup =?= "novafarm") || (CondorGroup =?= "system")
We're now trying (belatedly!) to enable opportunistic scheduling so that
unused resources can be used more efficiently.
I am trying to do this by setting machine ranks, and then letting jobs
run on available other-group systems by defining a variable "CanEvict" - so:
CondorGroup = "novafarm"
SUBMIT_EXPRS = CondorGroup, $(SUBMIT_EXPRS)
START = (CondorGroup =?= "novafarm") || (CondorGroup =?= "system") || CanEvict
RANK = (20 * CondorGroup =?= "novafarm") + (10 * CondorGroup =?= "bes3farm")
> MachineMaxVacateTime = 300
So then someone can submit a job with (eg) +CondorGroup = "general" and
+CanEvict = True in order to run on any vacant slot.
What we find is that these jobs do start on the other-group machines,
but when another group then submits their own jobs, the guest jobs never
get evicted.
I don't have anything else defined regarding preemption:
WANT_SUSPEND = TRUE
WANT_VACATE = TRUE
SUSPEND = FALSE
CONTINUE = TRUE
PREEMPT = FALSE
KILL = FALSE
PREEMPTION_REQUIREMENTS = FALSE
PREEMPTION_RANK = 0
but my interpretation was that RANK by itself should achieve the desired
effect.
Have I written enough for anyone to say where I'm going wrong?
Thanks, Graham
--