HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Condor as a vm scheduler



-----Original Message-----
From: Matthew Farrellee [mailto:matt@xxxxxxxxxx] 
Sent: 13 January 2010 04:21
To: Matt Hope
Cc: condor-devel@xxxxxxxxxxx
Subject: Re: [Condor-devel] Condor as a vm scheduler

On 01/12/2010 01:07 PM, Matt Hope wrote:
> The problem you raise is just when initially populating a pool? Anything that can aggregate slots together will work. 
> As a bootstrap, you could assign a random number to each node and use it as a component of your RANK.

If I have two jobs (on independent queues) which would like to not run 'against' each other how, in either dynamic or fixed slots models can I handle this at the scheduler level.
If you doll out only one job per cycle, and the collector is going to get updated in between those cycles then it will work (albeit with the overhead of waiting for this to occur) , if not then an SMP machine may get both assigned to it (and then you kick one or the other after the event).

I see no way in the current system I can achieve the desired behaviour without throttling or post assignment checking and kicking. I'd love to know if there was a way to achieve it (I'm not clear how a random component can help me on this one?).

> It's pretty much embedding a specific mode of operation into the scheduler in my view. Something that the default scheduler tries to  > only do for very few things, fair-share being a notable example.

The scheduler has some fixed internal state it maintains about what it has done already within the negotiation cycle. To my knowledge that is inaccessible to any user level component of the scheduler, this is the problem as I see it.

> I say default scheduler, since I think Condor has a nice infrastructure in which different scheduling algorithms could be placed. 
> The functionality to do that just isn't in place right now.

it would be far more pleasant to be able to drop in an alternate scheduler (with no assumption of schedd's whatsoever) rather than distribute the process through job hooks (even if the distributed system may well be faster in some circumstances the difference at our level would not be meaningful)


> It's great that you're at a point where you can fine tune for throughput so much. When I was last looking at the startd cron code 
> I was wondering why it wasn't able to actively trigger an update. Would something like that get you closer to a collector with a more current view of the pool?

Not really (though it sounds sensible as a general rule) it's the *intra* negotiation cycle issues rather than inter negotiation cycles.

> Forgive me. I forget how many machines you have in your pool(s)? A good portion are Windows boxes?

All are windows, about 150 machines with 640 slots (we might go higher, we're memory rather than CPU constrained due to fixed partitioning of boxes to ensure throughput of high memory/disk utilization jobs.

Matt

----
Gloucester Research Limited believes the information provided herein is reliable. While every care has been taken to ensure accuracy, the information is furnished to the recipients with no warranty as to the completeness and accuracy of its contents and on condition that any errors or omissions shall not be made the basis for any claim, demand or cause for action.
The information in this email is intended only for the named recipient.  If you are not the intended recipient please notify us immediately and do not copy, distribute or take action based on this e-mail.
All messages sent to and from this email address will be logged by Gloucester Research Ltd and are subject to archival storage, monitoring, review and disclosure.
Gloucester Research Limited, 5th Floor, Whittington House, 19-30 Alfred Place, London WC1E 7EA.
Gloucester Research Limited is a company registered in England and Wales with company number 04267560.
----