Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Whole System Scheduling
- Date: Fri, 23 Oct 2009 11:35:05 -0500
- From: Dan Bradley <dan@xxxxxxxxxxxx>
- Subject: Re: [Condor-users] Whole System Scheduling
Jonathan D. Proulx wrote:
Hi All,
I've been trying to get whoel system scheduling working on my pool
for some months now and it is becoming a rather critial issue.
I've been basing my config off of http://nmi.cs.wisc.edu/node/1482
Ideally I'd like
1) Whole system jobs _must_ not run untill they have the whole system
2) Non "PriorityGroup" (predefined in config) jobs _should_ be
preempted when a "PriortyGroup" whole system job is scheduled
3) Whole system jobs _should_ be suspended untill all single slot
"PriorityGroup" jobs complete
Point one is critical as much of the code users are looking to
schedule in this way is benchmark code that is only meaningful if the
rest of the system is quiescent.
Igoring non Priority group users for now and trying to simply have
suspend the whole system job untill all other slots are clear fails at
MaxSuspendTime, which is understandable except the job does execute
for some period of time before being killed and requeued (usually in
the exact same slot)
The how-to recipe tries to merge itself with whatever existing
suspension policy you may have. In your case, the two are not really
compatible. I recommend getting rid of the "merge". In other words,
don't logically OR the existing $(SUSPEND) expression with the
whole-machine suspension expression.
Regarding #2 and #3. You might try something like the following. I
apologize that I have not had time to test this.
I assume the priority jobs have a ClassAd attribute named
PriorityGroup. For example, the jobs could be submitted with the
following line in the submit file, including the leading + sign:
+PriorityGroup = True
If you have some other scheme for identifying the priority jobs, it can
be used instead. Anyway, going ahead with the above assumption, here's
a config:
STARTD_JOB_EXPRS = $(STARTD_JOB_EXPRS) PriorityGroup RequiresWholeMachine
STARTD_SLOT_EXPRS = $(STARTD_SLOT_EXPRS) PriorityGroup
RequiresWholeMachine Activity
# Suspend the whole-machine job until the other slots are empty
SUSPEND = \
(SlotID == 1 && Slot1_RequiresWholeMachine =?= True && \
(Slot2_Activity =?= "Busy" || Slot3_Activity =?= "Busy" || ... )
# Preempt non-PriorityGroup jobs when a PriorityGroup whole-machine job
wants to run
PREEMPT = \
(SlotID != 1 && PriorityGroup =!= True && Slot1_PriorityGroup =?= True)
WANT_SUSPEND = $(SUSPEND)
CONTINUE = ( $(SUSPEND) =!= True )
Trying #4 I see that the whole system job gets schedule on slot one
and starts running. Slots 2 through N continue executing for 3min (a
negotiation cycle?) before they exit.
I'm not sure what the issue is. Let me know if you still have problems
using something along the lines of the above config.
My fondest wish would be for Condor to be able to allocate multiple CPUs and
jobs could simply require some number (which they could if I
configured a matrix of mutually exlusive slots I guess but as we get
up in to the world of 16 and more cores this gets crazy)
Agreed. This is the intention of the recently added dynamic slot support:
http://www.cs.wisc.edu/condor/manual/v7.2/3_12Setting_Up.html#SECTION004127900000000000000
However, this feature currently does not provide a good solution for
"defragmenting". What I mean is that if there is a steady supply of
single-cpu jobs, then jobs requiring more than one cpu may never get
scheduled unless they are lucky and a bunch of single cpu jobs all exit
at the same time. One workaround is to enforce a periodic drain so that
each execute node stops accepting more jobs until all slots are idle.
--Dan