Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] group <none> appeared after upgrade from 7.4.3 to 7.6.2

Date: Wed, 19 Oct 2011 11:49:31 -0500
From: Dan Bradley <dan@xxxxxxxxxxxx>
Subject: Re: [Condor-users] group <none> appeared after upgrade from 7.4.3 to 7.6.2


Yes, priority factor only influences allocation of resources within a group.

As for how to configure priority factors: if you set GROUP_PRIO_FACTORfor all other groups, I believe the default priority factory for usersin the <none> group can be set with DEFAULT_PRIO_FACTOR and/orREMOTE_PRIO_FACTOR, depending on how ACCOUNTANT_LOCAL_DOMAIN is configured.

I agree that it is worth considering adding some mechanism to Condorwhereby the <none> group is considered last in the negotiation cycle.This would more closely mimic the old behavior prior to 7.5.6. Ofcourse, it may not always be desired to starve the <none> group whenthere is contention for a subset of the pool, so this would probablyneed to be a configurable option. Ugh ... more knobs.


--Dan

On 10/19/11 11:24 AM, Joe Boyd wrote:

I couldn't figure out the syntax to tune anything for the <none> group.

GROUP_PRIO_FACTOR_<none>

throws an error and

GROUP_PRIO_FACTOR_none

didn't do anything because it's not the name of the group.  I didn't try

GROUP_PRIO_FACTOR_\0
I don't think that would help anyway. Since quotas are defined forall jobs now even if you didn't configure it that way yourself, theslots are handed out in the order of group starvation. The priorityfactor would only determine the order of which submitter WITHIN thegroup currently being negotiated gets the slots. Someone pleasecorrect me if I'm wrong.
joe

On 10/19/2011 11:04 AM, Steven Timm wrote:
Is it possible to set the GROUP_PRIORITY_FACTOR of the <none> group
to a very high value?

Steve

On Wed, 19 Oct 2011, Joe Boyd wrote:
Hi Dan,
Thanks, GROUP_DYNAMIC_MACH_CONSTRAINT fixes my problem. We useglideins withthe monitoring slot enabled so our pool always thinks we have twiceas many
slots as we actually have usable. Setting

GROUP_DYNAMIC_MACH_CONSTRAINT = ( IS_MONITOR_VM =!= True )

makes it ignore them and should get things back to normal.
It still seems like <none> should be considered a "special" groupand shouldalways be handed slots last in the list of groups instead of basedon it's
starvation rate as the rest of the groups are sorted. The cluster
administrator set the quotas on the real defined groups because theycareabout those getting slots and the <none> group which doesn'tactually existand is made up on the fly should be forced to take a back seat tothe definedgroups and not be put before them even if their starvation rate putsthem
higher. It seems like this would fix a lot of issues related to this.

Thanks for the quick response!!

joe

On 10/18/2011 04:16 PM, Dan Bradley wrote:
On 10/18/11 4:02 PM, Erik Erlandson wrote:
If you search for the first "Matched" line what you'll see isthat thejobs that were submitted without a group are now apparently inthe group"<none>" and that group actually has a quota somehow (the groupdoesn't
actually exist so it certainly doesn't have a quota). Jobs for that
"group" get run in front of the groups that haven't filled theirquota.
Hi Joe,

Accounting groups were enhanced to support fully generalized
Hierarchical Accounting Groups (HGQ), as of 7.5.6:

https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1393

There is always a root node in the accounting group hierarchy, whose
name is "<none>", and any job that does not map to some otheraccountinggroup will be assigned to<none>. This group always accepts anysurplus
quota not used by other groups.

You may also be interested in:
https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1926
(dev release only, not on current stable series 7.6)
Joe,
It sounds like in your case, the <none> group is being consideredbefore mostother groups because it is more "starved", meaning it is using asmallerfraction of its share of the pool compared to most other groups. IfI understandthe new group quota system correctly, group <none>'s share of thepool isdetermined by computing the share of the pool for all the othergroups andcounting what is left. In your pool there are 10216 slots. 5682 ofthese are
being assigned to group <none>.
One thing that can cause trouble is if you have special slots thatare notavailable to all jobs. In this case, the size of the pool may beeffectivelyoverestimated. The result is that dynamic quotas are too big, andgroups whichare considered first may get too many slots, while groups thatfollow willstarve. GROUP_DYNAMIC_MACH_CONSTRAINT can be used to attempt towork around this
problem. So can GROUP_QUOTA_ROUND_ROBIN_RATE.
I haven't had a chance to consider your case carefully enough tomake a specificrecommendation. If you continue to have trouble, I recommendopening a help
ticket with condor-admin@xxxxxxxxxxxx

--Dan

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxxwith a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxxwith a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

Follow-Ups:
- Re: [Condor-users] group <none> appeared after upgrade from 7.4.3 to 7.6.2
  - From: Joe Boyd

References:
- [Condor-users] group <none> appeared after upgrade from 7.4.3 to 7.6.2
  - From: Joe Boyd
- Re: [Condor-users] group <none> appeared after upgrade from 7.4.3 to 7.6.2
  - From: Erik Erlandson
- Re: [Condor-users] group <none> appeared after upgrade from 7.4.3 to 7.6.2
  - From: Dan Bradley
- Re: [Condor-users] group <none> appeared after upgrade from 7.4.3 to 7.6.2
  - From: Joe Boyd
- Re: [Condor-users] group <none> appeared after upgrade from 7.4.3 to 7.6.2
  - From: Steven Timm
- Re: [Condor-users] group <none> appeared after upgrade from 7.4.3 to 7.6.2
  - From: Joe Boyd

Prev by Date: Re: [Condor-users] group <none> appeared after upgrade from 7.4.3 to 7.6.2
Next by Date: Re: [Condor-users] group <none> appeared after upgrade from 7.4.3 to 7.6.2
Previous by thread: Re: [Condor-users] group <none> appeared after upgrade from 7.4.3 to 7.6.2
Next by thread: Re: [Condor-users] group <none> appeared after upgrade from 7.4.3 to 7.6.2
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [Condor-users] group <none> appeared after upgrade from 7.4.3 to 7.6.2