Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Whole System Scheduling
- Date: Fri, 23 Oct 2009 14:30:15 -0400
- From: "Jonathan D. Proulx" <jon@xxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Whole System Scheduling
Hi Dan,
Thanks for the detailed notes on the recipe. I'll go through again
with your suggested chages. I thought I'd tried replacing the SUSPEND
statement, but it may have been before I fixed another issue that was
giving me trouble.
Dynamic Provisioning sounds like a move in the right direction, but
I don't think I'll hang all my hopes there.
On Fri, Oct 23, 2009 at 11:35:05AM -0500, Dan Bradley wrote:
:
:
:Jonathan D. Proulx wrote:
:> My fondest wish would be for Condor to be able to allocate multiple CPUs and
:> jobs could simply require some number (which they could if I
:> configured a matrix of mutually exlusive slots I guess but as we get
:> up in to the world of 16 and more cores this gets crazy)
:>
:Agreed. This is the intention of the recently added dynamic slot support:
:
:http://www.cs.wisc.edu/condor/manual/v7.2/3_12Setting_Up.html#SECTION004127900000000000000
:
:However, this feature currently does not provide a good solution for
:"defragmenting". What I mean is that if there is a steady supply of
:single-cpu jobs, then jobs requiring more than one cpu may never get
:scheduled unless they are lucky and a bunch of single cpu jobs all exit
:at the same time. One workaround is to enforce a periodic drain so that
:each execute node stops accepting more jobs until all slots are idle.
Periodic drain is a good idea,
An other issue I'm seeing with my 1/2 hr of experience using dynamic
slots is that they split slowly since the Partitionable slot only
matches once per negotiation cycle (about 5min on my test system) it
takes N * NegotiationCycle to fully populate an Nway system with
single processor jobs (or 40minutes for my system), this is alos less
than optimal.