[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Setup Advice Needed

Date: Fri, 12 Nov 2010 10:02:00 -0600
From: "Craig A. Struble, Ph.D." <craig.struble@xxxxxxxxxxxxx>
Subject: [Condor-users] Setup Advice Needed

At Marquette, our Condor pools have been growing and we seem to be at a tipping point in terms of performance. We have recently configured the job router on our primary cluster to route jobs to our other pools across campus using Condor-C (flocking isn't really an option), giving us over 1600 available slots.

Our current Condor 7.4.4 setup has the collector, negotiator, job router and schedd all running on the head node (an 8 core machine with 24 GB of RAM, 2 x 1 Gbs networks, 1 x 20Gbs Infiniband). When we launch a few thousand jobs capable of being routed, the system is fine for a while, but eventually the schedd becomes unresponsive and the overall head node load skyrockets due to the number of running shadow daemons.

Should we consider partitioning our Condor daemons onto different nodes? What partitioning works best? Would a second schedd, to handle the routed jobs, be helpful? What have others done and what seems to work well?

Thanks.

    Craig
--
Craig A. Struble, Ph.D. | Marquette University
Associate Professor of Computer Science | 369 Cudahy Hall
(414)288-3783 | (414)288-5472 (fax)
http://www.mscs.mu.edu/~cstruble | craig.struble@xxxxxxxxxxxxx

Follow-Ups:
- Re: [Condor-users] Setup Advice Needed
  - From: Matthew Farrellee

Prev by Date: Re: [Condor-users] How to distinguish between 32-bit and 64-bit architecture
Next by Date: [Condor-users] New Version of Stork Data Scheduler (v2.0) Released Today
Previous by thread: [Condor-users] connection problem
Next by thread: Re: [Condor-users] Setup Advice Needed
Index(es):
- Date
- Thread