Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] net topology
- Date: Fri, 26 May 2017 10:03:19 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] net topology
On 5/26/2017 6:20 AM, lejeczek wrote:
hi everybody
like earlier, a newbie here, trying to grasp all those concepts condor
(may)offer.
Question:
Can a pool be configured with a central manager(s - in HA setup) where
only the central manager(s) would be submitting and the rest of the pool
would be on different subnet.
You probably envision easily what I'm asking: HA managers(10.0.0.0 users
see) then the rest 10.1.0.0 users don't and submit only via c. managers'
10.0.0.x.
Would such a setup work and by allowed by the design?
m.! tahnks
L.
Not quite sure what you want from the above, but I think perhaps you
have missed an important point re the architecture of an HTCondor pool.
In an HTCondor pool, there is:
1. one central manager (CM), and one or more optional backup CMs if
you bother with the HA setup.
2. one or more submit machines.
3. one or more execute machines.
Any machine can serve one, two, or all the of the above three roles
simply based on what daemons are listed in DAEMON_LIST. Your central
manager(s) do NOT have to be the same as your submit machines. Any
machine in the pool that runs the "condor_schedd" daemon can act as a
submit machine, just by adding SCHEDD to the DAEMON_LIST config knob.
Esp for larger pools, it is a good idea to have dedicated machine(s) for
each role: one central manager, one or submit machines, one or more
execute machines. You can have as many submit machines as you want; here
at UW, we have a pool with 1 central manager, ~500 execute machines, and
80+ submit machines, as we have many submit machines that are embedded
within various research labs that only have logins for the researchers
in that lab. Meanwhile our central manager is located in the
centralized IT data center. If one of the 80+ submit machines goes
down, only the jobs submitted on that one submit machine are impacted;
all the other submit machine continue to operate as normal. More
details on this is at
http://research.cs.wisc.edu/htcondor/manual/v8.7/3_1Introduction.html#SECTION00411000000000000000
If you are asking "can I have a submit machine that is on two networks,
a public network that users can access via ssh to login, and a private
network that holds all my execute nodes and my central manager", the
answer is yes.
Hope the above helps
Todd