[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] can Condor somehow be a HA?




On 22/05/17 17:02, Todd Tannenbaum wrote:
On 5/22/2017 6:45 AM, lejeczek wrote:
hi fellas

I've only started looking at htcondor, not having a good understanding of it yet I wonder - htcondor has that concept of "central manager" and I wonder if this makes it a valid candidate for HA setup?
Does anybody have any experience with/thoughts on 
htcondor as HA and could share it here?
many thanks
L.
Hi,

First off, understand that if your installations central manager dies, currently running jobs will continue to run and even new jobs will continue to get scheduled in many cases (i.e. new jobs will still get scheduled to claimed slots). Even in production pools, most sites have no problem with rebooting their central manager or even taking it down for an hour or two - while the central manger is down, users may notice that condor_status stops working, but practically all other common tools continue to work (condor_submit, condor_q, condor_rm, etc). Thus many pools don't ever bother with an HA solution for the central manager.
If you are still concerned, the HTCondor central manager 
is actually very lightweight and holds very little state 
(just user prioirties), and this is very amenable to a 
high availability (HA) setup.  You essentially have two 
choices:
1. HTCondor can be configured to have two central managers 
(hot/hot), and automatically fail over as needed.  See the 
section in the HTCondor Manual titled "High Availability 
of the Central Manger" at
http://research.cs.wisc.edu/htcondor/manual/v8.6/3_13High_Availability.html#SECTION004132200000000000000 

2. If you already run your services in a managed 
visualized setup (Mesos+Marathan, OpenStack, vSphere, 
HyperV, etc) that supports failover, you could setup your 
HTCondor central manager for HA leveraging those 
environments, i.e. same way you would setup a redundant 
email server, for instance.

Hope the above helps
Todd


thanks, that is a great "shedding lights on" for a novice like myself.