Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Help understanding SUSPEND expression
- Date: Thu, 07 Mar 2024 21:21:34 +0000
- From: Angel de Vicente <angel.vicente.garrido@xxxxxxxxx>
- Subject: [HTCondor-users] Help understanding SUSPEND expression
Hello,
despite having been using HTCondor for a long time, today I realized
that I'm having troubles with the SUSPEND expression, so I hope somebody
can shed some light here...
For a long time, the SUSPEND expression in our machines has been:
,----
| SUSPEND = ( ((CpuBusyTime > 2 * $(MINUTE)) && ($(ActivationTimer) > 90)) \
| || ( (WorkerType == "desktop" || WorkerType == "burro_pro") && $(KeyboardBusy) ) )
`----
WorkerType is just a characteristic that we add to our machines, so in
some of them the keyboard activity will be taken into account while in
other not.
Anyway, today I found that HTCondor was not evicting jobs despite having
a high load in the machine, so I was looking at the SUSPEND expression.
I see references to CpuBusyTime in several documents, but our machine has
not CpuBusyTime info anymore (this is HTCondor 23.0.4):
,----
| $ condor_status -l xxxx.xxx | grep -i cpubusytime
`----
so I guess it is no surprise that the job was not being evicted, since I
assume SUSPEND was never evaluated to True.
I assumed that this has probably changed in some recent HTCondor
version, so I looked into the current POLICY:DESKTOP template, which
reads:
# $ condor_config_val use policy:desktop
# use POLICY:DESKTOP is
# if ! defined PolicyExprFragments
# use FEATURE : POLICY_EXPR_FRAGMENTS
# endif
# STARTD_LATCH_EXPRS = $(STARTD_LATCH_EXPRS) CpuBusy
# CpuBusyTimer=IfThenElse(CpuBusyValue is 1, time() - CpuBusyTime, 0)
# WANT_SUSPEND=($(SmallJob) || $(KeyboardNotBusy) || $(IsVanilla) ) && ( $(SUSPEND))
# WANT_VACATE=$(ActivationTimer) > 600 || $(IsVanilla)
# SUSPEND=($(KeyboardBusy) || ( ($(CpuBusyTimer) > 120) && $(ActivationTimer) > 90))
# CONTINUE=($(CPUIdle) && ($(ActivityTimer) > 10) && (KeyboardIdle > $(ContinueIdleTime)))
# PREEMPT=(((Activity == "Suspended") && ($(ActivityTimer) > $(MaxSuspendTime))) || (SUSPEND && (WANT_SUSPEND == False)))
# START=((KeyboardIdle > $(StartIdleTime)) && ( $(CPUIdle) || (State != "Unclaimed" && State != "Owner")) )
# KILL=False
# MaxJobRetirementTime=0
# CLAIM_WORKLIFE=
# SLOTS_CONNECTED_TO_KEYBOARD=1024*1024
# SLOTS_CONNECTED_TO_CONSOLE=1024*1024
# IS_OWNER=(START =?= False)
OK, so SUSPEND is defined in terms of the macro CpuBusyTimer, which is
defined in terms of CpuBusyValue, and CpuBusyTime, but none are defined
,----
| $ condor_status -l xxxx.xxx | grep -i cpubusytimer
`----
so if I understand correctly that part of the expression
($(CpuBusyTimer) > 120) is never going to be true, and as such my jobs
will never try to suspend.
It is a long time since I play with these expressions, but surely I'm
missing something?
Any help appreciated,
--
Ãngel de Vicente
Research Software Engineer (Supercomputing and BigData)
Tel.: +34 922-605-747
GPG: 0x8BDC390B69033F52