Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Deployment Recommendations
- Date: Sun, 26 Apr 2009 15:06:02 -0400
- From: Preston Smith <psmith@xxxxxxxxxx>
- Subject: Re: [Condor-users] Deployment Recommendations
On Apr 23, 2009, at 8:01 AM, James Osborne wrote:
Dear All
My name is James Osborne and I am the Condor Project Manger at
Cardiff University in the UK. Now that summer is approaching, and I
have some nice new virtualization infrastructure coming on stream, I
am in the process of virtualizing our Condor infrastructure. I
already have a virtual submit machine which works very well with
surprisingly low overhead (I couldn't push it harder than about 4%
cpu usage with 000s of 15 minute jobs in the queue). The
virtualization infrastructure will soon be a load-balanced pair of
3GHz dual-socket quad-core machines with 32GB of RAM each with
multiple redundant connections into FC storage.
I seem to remember hearing that a good 'rule of thumb' was to have
no more than 2000 execute nodes reporting to a single central manager.
1) Is that still the case ?
A few years ago. Today, one of our single pool has nearly 9k execute
slots. If your 9000 slots are Windows, you'll probably want to make
sure to use TCP updates to the collector.
2) Has anybody pushed a single central manager to about 9000 execute
nodes ?
3) Does it make more sense to deploy 4-5 central managers instead
and use flocking ?
It does help in some instances: logically separating machines by
administrative domain or other, but it'll also make your environment
more complicated. We have many cores at Purdue, most of which are in 3
pools, but with several other smaller, flocked pools.
4) If so, would you for example use one central manager per core
network router even if that increased the number of managers to 8 or
more ?
I try and group them: a pool for all sorts of distributed machines
around campus
a pool of HPC cluster nodes with external WAN connectivity
and a pool of cluster nodes that are on private IP space.
5) Has anybody tried to flock jobs to 8 or more central managers ?
Yep.
I can already see how I can set execute nodes to report to different
central managers in my Condor distribution scripts.
I look forwards to hearing from those of you with big pools...
Thanks in advance. Best regards
James_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/