Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Deployment Recommendations

Date: Sun, 26 Apr 2009 15:06:02 -0400
From: Preston Smith <psmith@xxxxxxxxxx>
Subject: Re: [Condor-users] Deployment Recommendations


On Apr 23, 2009, at 8:01 AM, James Osborne wrote:

Dear All
My name is James Osborne and I am the Condor Project Manger atCardiff University in the UK. Now that summer is approaching, and Ihave some nice new virtualization infrastructure coming on stream, Iam in the process of virtualizing our Condor infrastructure. Ialready have a virtual submit machine which works very well withsurprisingly low overhead (I couldn't push it harder than about 4%cpu usage with 000s of 15 minute jobs in the queue). Thevirtualization infrastructure will soon be a load-balanced pair of3GHz dual-socket quad-core machines with 32GB of RAM each withmultiple redundant connections into FC storage.
I seem to remember hearing that a good 'rule of thumb' was to haveno more than 2000 execute nodes reporting to a single central manager.
1) Is that still the case ?

A few years ago. Today, one of our single pool has nearly 9k executeslots. If your 9000 slots are Windows, you'll probably want to makesure to use TCP updates to the collector.

2) Has anybody pushed a single central manager to about 9000 executenodes ?
3) Does it make more sense to deploy 4-5 central managers insteadand use flocking ?

It does help in some instances: logically separating machines byadministrative domain or other, but it'll also make your environmentmore complicated. We have many cores at Purdue, most of which are in 3pools, but with several other smaller, flocked pools.

4) If so, would you for example use one central manager per corenetwork router even if that increased the number of managers to 8 ormore ?

I try and group them: a pool for all sorts of distributed machinesaround campus

   a pool of HPC cluster nodes with external WAN connectivity
   and a pool of cluster nodes that are on private IP space.


5) Has anybody tried to flock jobs to 8 or more central managers ?


 Yep.

I can already see how I can set execute nodes to report to differentcentral managers in my Condor distribution scripts.
I look forwards to hearing from those of you with big pools...

Thanks in advance.  Best regards

James_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxxwith a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

References:
- [Condor-users] Deployment Recommendations
  - From: James Osborne

Prev by Date: Re: [Condor-users] Deployment Recommendations
Next by Date: Re: [Condor-users] Deployment Recommendations
Previous by thread: Re: [Condor-users] Deployment Recommendations
Next by thread: Re: [Condor-users] Deployment Recommendations
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [Condor-users] Deployment Recommendations