[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] load-balanced central manager?



Hi,

Is it possible to use Condor in a way like there's multiple running
instances of every component (including negotiator) in a pool, and in
this way to provide a load-balanced fail-tolerant environment? Or is
it possible to use only one single negotiator in a pool at once (I
know it's possible to do fail-over with had)?
It is possible to do fail-over with HAD, but it picks the one negotiator 
to be running at any one time,.  Should the current active negotiator go 
down, it will pick another to start.  Note that if the negotiator or the 
collector crash, all existing jobs stay running, and the schedds will 
even start new jobs running if they can re-use the claims they already 
have.  Separately, it is also possible to tell the negotiator that it is 
responsible for some subset of the machines in the pool, and only 
provide matches to those machines.
I've read about flocking also. So in that way there'd be a number of
pools available with their own central managers. What happens before a
job get flocked?
Before a job can be flocked, it has to fail to match in the local pool (either due to load or a conflict between job and machine requirements).
  Does flocking help to provide some kind of load
balancing between several central managers? Or it makes the situation
even worse because it requires extra work from central managers?
Generally speaking, there isn't a huge load on the central manager, 
except in the largest of pools, and even then, claim reuse helps 
tremendously.  What can be a problem with the central managers is when 
then need to communicate with schedds over high latency WAN links, 
especially when strong security is enabled.
-greg