[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] CM Failover with submits from CM
- Date: Tue, 14 Jul 2009 09:10:06 -0400
- From: Janzen Brewer <janzen.brewer@xxxxxxxxxxxxxxx>
- Subject: [Condor-users] CM Failover with submits from CM
I have an interesting problem. I believe I detailed the setup of my
organization's cluster/network in an earlier post, but I will repeat it
here:
Nine compute machines (running STARTD, SCHEDD) exist on a private subnet
and cannot be reached (by design) from the rest of my organization. They
are connected to a switch which is also connected to the primary and
secondary central managers. The primary/secondary CMs have two NICs
each. Each CM has an IP on the private subnet and on my organization's
public network.
Problem: I tried submitting a job from my workstation (on my
organization's public network), but the CMs tell it to talk to the
STARTD at a private address, which obviously doesn't work. I told my
boss about this and asked how he wanted to proceed, and he wants to only
allow job submissions from the CMs. This works, BUT he also wants
failover capability. I don't foresee this working well since the submit
machine goes down when the CM goes down, even though CM functionality
fails over.
What kind of options do I have? My boss is adamant that the nodes stay
on a private subnet and that we have CM failover capability. I don't
think having a separate submit machine which straddles the private and
public networks (like the current CMs) will work. My boss wants no
single point of failure present in the system.