Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Groups with weighted slots in 7.6.9
- Date: Tue, 11 Sep 2012 14:04:07 -0700
- From: Erik Erlandson <eje@xxxxxxxxxx>
- Subject: Re: [Condor-users] Groups with weighted slots in 7.6.9
Hi William,
I agree that it looks like #2958, and that fix went in at 7.6.7.
Can you describe any configuration related to GROUP_AUTOREGROUP[_*]
and/or GROUP_ACCEPT_SURPLUS[_*]?
On Tue, 2012-09-11 at 16:37 -0400, William Strecker-Kellogg wrote:
> Hi all,
>
> There is an interesting problem I'm having, related to the use of group
> quotas and weighted slots (similar to ticket #2958). While experimenting
> I came across something that looks like a bug similar to what was
> supposed to be addressed #2958.
>
> The setup involves a large number of machines with three 8-core slots
> each (about 2000 cores total). When using group quotas I see the
> following behavior:
>
> First, I submit 20 jobs matching only those slots (no other contention,
> plenty of free slots) each with "request_cpus = 8" and belonging to an
> AccountingGroup with a quota of >2000. I see the following (grep for
> "group_atlas.prod.mp" in the attached logs for the full story), the
> first two jobs match, then the rest are rejected with "group quota
> exceeded" warnings. It appears that the groupQuota it sees is 20 (the
> number of idle jobs), and after the first match it uses 8, the second
> and 16 are used, then the next fails because "pieLeft" is 4.0. It is as
> if the weights are being applied only after it matches and are not
> counted for in it's match-making algorithm limit (pieLeft is 20.0 at the
> start, should be 160.0?)
>
> It is reproducible with numbers other than 20 jobs and 8-cores; with <N>
> k-core jobs in a queue up to floor(N/k) jobs will match before exceeding
> the quota.
>
> The workaround I found is to set "SlotWeight=1" on the 8-core slots,
> which makes things work great except for the accounting (which doesn't
> matter for what we are doing right now).
>
> We may be going to 7.8 soon so it may not be an issue if it is fixed
> then, but in case it isn't I figured I'd report on my findings anyway.
>
> Thanks,
> Will Strecker-Kellogg
> RACF/BNL
>
>
>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/