Re: [HTCondor-devel] Is this a negotiator or a documentation bug?


Date: Wed, 20 Nov 2013 16:16:33 -0500
From: Ben Cotton <ben.cotton@xxxxxxxxxxxxxxxxxx>
Subject: Re: [HTCondor-devel] Is this a negotiator or a documentation bug?
Any brilliant ideas?


Thanks,
BC

On Fri, Nov 15, 2013 at 5:05 PM, Ben Cotton
<ben.cotton@xxxxxxxxxxxxxxxxxx> wrote:
> Hi friends,
>
> Another fun question has come up. We have a customer who is seeing
> some issues with quotas not being properly respected. Here's the
> relevant config from the 16-core test pool:
>
> GROUP_NAMES = mrm, sp, erm
> GROUP_QUOTA_MRM = 11
> GROUP_QUOTA_SP = 2
> GROUP_QUOTA_ERM = 2
> GROUP_ACCEPT_SURPLUS = True
> PREEMPTION_REQUIREMENTS = (SubmitterGroupResourcesInUse <
> SubmitterGroupQuota) && (RemoteGroupResourcesInUse > RemoteGroupQuota)
> && ( RemoteGroup =!= SubmitterGroup)
>
> However, when user 'sp.sp_high' has 10 jobs running and then user
> 'mrm.mrm_daily' submits 10 jobs, only 6 mrm jobs start. The negotiator
> refuses to preempt because it determines that PREEMPTION_REQUIREMENTS
> evaluates to False. Here's the count I get from using debug()
>
> SubmitterGroupResourcesInUse 6
> SubmitterGroupQuota 10
> RemoteGroupResourcesInUse 6
> RemoteGroupQuota 10
>
> So RemoteGroupQuota is clearly not the same as GROUP_QUOTA_SP. From my
> understanding of section 3.4.3 of the manual, they should be the same.
> On the other hand, the negotiator log has this entry:
>
> group quotas: groups= 4  requesting= 2  served= 2  unserved= 0  slots=
> 16  requested= 20  allocated= 20  surplus= 11
>
> So it also stands to reason that when GROUP_ACCEPT_SURPLUS is true,
> then RemoteGroupQuota is the greater of GROUP_QUOTA_<GROUP> or the
> number of slots a group is using. If that's the case, then it's not
> clear in the manual (or is made clear somewhere other than section
> 3.4.3).
>
> I'll note that the customer only reported this behavior after
> upgrading from version 7.8 to version 8.0.2. Given the noise they've
> made about it, I suspect it really is new behavior. I wonder if it's
> related to https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=3695
>
>
> Thanks,
> BC
>
> --
> Ben Cotton
> main: 888.292.5320
>
> Cycle Computing
> Leader in Utility HPC Software
>
> http://www.cyclecomputing.com
> twitter: @cyclecomputing



-- 
Ben Cotton
main: 888.292.5320

Cycle Computing
Leader in Utility HPC Software

http://www.cyclecomputing.com
twitter: @cyclecomputing
[← Prev in Thread] Current Thread [Next in Thread→]