On 26/05/17 16:03, Todd Tannenbaum wrote:
On 5/26/2017 6:20 AM, lejeczek wrote:
hi everybody
like earlier, a newbie here, trying to grasp all those
concepts condor (may)offer.
Question:
Can a pool be configured with a central manager(s - in
HA setup) where only the central manager(s) would be
submitting and the rest of the pool would be on
different subnet.
You probably envision easily what I'm asking: HA
managers(10.0.0.0 users see) then the rest 10.1.0.0
users don't and submit only via c. managers' 10.0.0.x.
Would such a setup work and by allowed by the design?
m.! tahnks
L.
Not quite sure what you want from the above, but I think
perhaps you have missed an important point re the
architecture of an HTCondor pool.
In an HTCondor pool, there is:
1. one central manager (CM), and one or more optional
backup CMs if you bother with the HA setup.
2. one or more submit machines.
3. one or more execute machines.
Any machine can serve one, two, or all the of the above
three roles simply based on what daemons are listed in
DAEMON_LIST. Your central manager(s) do NOT have to be
the same as your submit machines. Any machine in the
pool that runs the "condor_schedd" daemon can act as a
submit machine, just by adding SCHEDD to the DAEMON_LIST
config knob. Esp for larger pools, it is a good idea to
have dedicated machine(s) for each role: one central
manager, one or submit machines, one or more execute
machines. You can have as many submit machines as you
want; here at UW, we have a pool with 1 central manager,
~500 execute machines, and 80+ submit machines, as we
have many submit machines that are embedded within
various research labs that only have logins for the
researchers in that lab. Meanwhile our central manager is
located in the centralized IT data center. If one of the
80+ submit machines goes down, only the jobs submitted on
that one submit machine are impacted; all the other
submit machine continue to operate as normal. More
details on this is at
http://research.cs.wisc.edu/htcondor/manual/v8.7/3_1Introduction.html#SECTION00411000000000000000
If you are asking "can I have a submit machine that is on
two networks, a public network that users can access via
ssh to login, and a private network that holds all my
execute nodes and my central manager", the answer is yes.
Hope the above helps
Todd
I was asking because I did:
HA with two central manager with specific
NETWORK_INTERFACE (on a subnet A) and then, a exec node
(with only DAEMON_LIST = MASTER, STARTD) pointing to
CONDOR_HOST = $(CENTRAL_MANAGER1),$(CENTRAL_MANAGER2) but
different subnet, not CM's NETWORK_INTERFACE (a subnet B
to which central managers are also connected).
And it works apparently, but I was worried as I could see
in _status the exec node twice, affecting Total.
That was soon after I started exec node but now after long
weekend it seems condor corrected it somehow.
Why do you think it showed up twice?
many thanks.