Hi friends,
Another fun question has come up. We have a customer who is seeing
some issues with quotas not being properly respected. Here's the
relevant config from the 16-core test pool:
GROUP_NAMES = mrm, sp, erm
GROUP_QUOTA_MRM = 11
GROUP_QUOTA_SP = 2
GROUP_QUOTA_ERM = 2
GROUP_ACCEPT_SURPLUS = True
PREEMPTION_REQUIREMENTS = (SubmitterGroupResourcesInUse <
SubmitterGroupQuota) && (RemoteGroupResourcesInUse > RemoteGroupQuota)
&& ( RemoteGroup =!= SubmitterGroup)
However, when user 'sp.sp_high' has 10 jobs running and then user
'mrm.mrm_daily' submits 10 jobs, only 6 mrm jobs start. The negotiator
refuses to preempt because it determines that PREEMPTION_REQUIREMENTS
evaluates to False. Here's the count I get from using debug()
SubmitterGroupResourcesInUse 6
SubmitterGroupQuota 10
RemoteGroupResourcesInUse 6
RemoteGroupQuota 10
So RemoteGroupQuota is clearly not the same as GROUP_QUOTA_SP. From my
understanding of section 3.4.3 of the manual, they should be the same.
On the other hand, the negotiator log has this entry:
group quotas: groups= 4 requesting= 2 served= 2 unserved= 0 slots=
16 requested= 20 allocated= 20 surplus= 11
So it also stands to reason that when GROUP_ACCEPT_SURPLUS is true,
then RemoteGroupQuota is the greater of GROUP_QUOTA_<GROUP> or the
number of slots a group is using. If that's the case, then it's not
clear in the manual (or is made clear somewhere other than section
3.4.3).
I'll note that the customer only reported this behavior after
upgrading from version 7.8 to version 8.0.2. Given the noise they've
made about it, I suspect it really is new behavior. I wonder if it's
related to https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=3695
Thanks,
BC
--
Ben Cotton
main: 888.292.5320
Cycle Computing
Leader in Utility HPC Software
http://www.cyclecomputing.com
twitter: @cyclecomputing
|