Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Delayed hold
Rita,
If you want quicker holds, you may want to set WANT_HOLD on your
execute nodes. From the section 3.3.10 of the manual:
###
WANT_HOLD
A boolean expression that defaults to False. When True and the value
of PREEMPT becomes True and WANT_SUSPEND is False and
MAXJOBRETIREMENTTIME has expired, the job is put on hold for the
reason (optionally) specified by the variables WANT_HOLD_REASON and
WANT_HOLD_SUBCODE. As usual, the job owner may specify
periodic_release and/or periodic_remove expressions to react to
specific hold states automatically. The attribute HoldReasonCode in
the job ClassAd is set to the value 21 when WANT_HOLD is responsible
for putting the job on hold.
Here is an example policy that puts jobs on hold that use too much
virtual memory:
VIRTUAL_MEMORY_AVAILABLE_MB = (VirtualMemory*0.9)
MEMORY_EXCEEDED = ImageSize/1024 > $(VIRTUAL_MEMORY_AVAILABLE_MB)
PREEMPT = ($(PREEMPT)) || ($(MEMORY_EXCEEDED))
WANT_SUSPEND = ($(WANT_SUSPEND)) && ($(MEMORY_EXCEEDED)) =!= TRUE
WANT_HOLD = ($(MEMORY_EXCEEDED))
WANT_HOLD_REASON = \
ifThenElse( $(MEMORY_EXCEEDED), \
"Your job used too much virtual memory.", \
undefined )
###
This will help avoid the "job crossed the memory threshold right after
the shadow received an update" issue.
Thanks,
BC
--
Ben Cotton
main: 888.292.5320
Cycle Computing
Leader in Utility HPC Software
http://www.cyclecomputing.com
twitter: @cyclecomputing