[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Deployment Recommendations
- Date: Thu, 23 Apr 2009 13:01:49 +0100
- From: James Osborne <OsborneJA1@xxxxxxxxxxxxx>
- Subject: [Condor-users] Deployment Recommendations
Dear All
My name is James Osborne and I am the
Condor Project Manger at Cardiff University in the UK. Now that summer
is approaching, and I have some nice new virtualization infrastructure
coming on stream, I am in the process of virtualizing our Condor infrastructure.
I already have a virtual submit machine which works very well with
surprisingly low overhead (I couldn't push it harder than about 4% cpu
usage with 000s of 15 minute jobs in the queue). The virtualization
infrastructure will soon be a load-balanced pair of 3GHz dual-socket quad-core
machines with 32GB of RAM each with multiple redundant connections into
FC storage.
I seem to remember hearing that a good
'rule of thumb' was to have no more than 2000 execute nodes reporting
to a single central manager.
1) Is that still the case ?
2) Has anybody pushed a single central
manager to about 9000 execute nodes ?
3) Does it make more sense to deploy
4-5 central managers instead and use flocking ?
4) If so, would you for example use
one central manager per core network router even if that increased the
number of managers to 8 or more ?
5) Has anybody tried to flock jobs to
8 or more central managers ?
I can already see how I can set execute
nodes to report to different central managers in my Condor distribution
scripts.
I look forwards to hearing from those
of you with big pools...
Thanks in advance. Best regards
James