HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Condor as a vm scheduler



On 01/13/2010 05:22 AM, Matt Hope wrote:
> -----Original Message----- From: Matthew Farrellee
> [mailto:matt@xxxxxxxxxx] Sent: 13 January 2010 04:21 To: Matt Hope 
> Cc: condor-devel@xxxxxxxxxxx Subject: Re: [Condor-devel] Condor as a
> vm scheduler
> 
> On 01/12/2010 01:07 PM, Matt Hope wrote:
>> The problem you raise is just when initially populating a pool?
>> Anything that can aggregate slots together will work. As a
>> bootstrap, you could assign a random number to each node and use it
>> as a component of your RANK.
> 
> If I have two jobs (on independent queues) which would like to not
> run 'against' each other how, in either dynamic or fixed slots models
> can I handle this at the scheduler level. If you doll out only one
> job per cycle, and the collector is going to get updated in between
> those cycles then it will work (albeit with the overhead of waiting
> for this to occur) , if not then an SMP machine may get both assigned
> to it (and then you kick one or the other after the event).
> 
> I see no way in the current system I can achieve the desired
> behaviour without throttling or post assignment checking and kicking.
> I'd love to know if there was a way to achieve it (I'm not clear how
> a random component can help me on this one?).

Without using dynamic slots, you pretty much have to do the post processing. There is some information available for policy during match, but only in aggregate - you don't get to ask "did job X also match with this exec node".

There's still debate about what information should be available in the negotiator, e.g. if dynamic slots can be created by actions the negotiator makes or only by the startd.


>> It's pretty much embedding a specific mode of operation into the
>> scheduler in my view. Something that the default scheduler tries to
>> > only do for very few things, fair-share being a notable example.
> 
> The scheduler has some fixed internal state it maintains about what
> it has done already within the negotiation cycle. To my knowledge
> that is inaccessible to any user level component of the scheduler,
> this is the problem as I see it.

It's only accessible in aggregate, yes.


>> I say default scheduler, since I think Condor has a nice
>> infrastructure in which different scheduling algorithms could be
>> placed. The functionality to do that just isn't in place right
>> now.
> 
> it would be far more pleasant to be able to drop in an alternate
> scheduler (with no assumption of schedd's whatsoever) rather than
> distribute the process through job hooks (even if the distributed
> system may well be faster in some circumstances the difference at our
> level would not be meaningful)
> 
> 
>> It's great that you're at a point where you can fine tune for
>> throughput so much. When I was last looking at the startd cron code
>>  I was wondering why it wasn't able to actively trigger an update.
>> Would something like that get you closer to a collector with a more
>> current view of the pool?
> 
> Not really (though it sounds sensible as a general rule) it's the
> *intra* negotiation cycle issues rather than inter negotiation
> cycles.

Do you often go from 0 to 100 in your workflows? Is it more HPC like?


Best,


matt