[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] CM Failover with submits from CM



Condor supports fail-over of the submit node.  I don't have experience 
with that, so I'll just focus on a different aspect of your question.
In Condor 7.x, you can use CCB to allow a submit machine to operate from 
outside of the private network, assuming you have outbound connectivity 
from the private network to the submit machine.  (Whether that still 
satisfies your boss's security concerns is a different question.  CCB 
access can be regulated using Condor's standard authentication and 
authorization options.)
To configure it, you would simply list both of your CMs in CCB_ADDRESS 
in the configuration of the execute nodes.  If one CM fails, things will 
automatically fail over to the other.  While both are functioning, CCB 
traffic will load-balance across the two.
--Dan

Janzen Brewer wrote:
I have an interesting problem. I believe I detailed the setup of my organization's cluster/network in an earlier post, but I will repeat it here:
Nine compute machines (running STARTD, SCHEDD) exist on a private subnet 
and cannot be reached (by design) from the rest of my organization. They 
are connected to a switch which is also connected to the primary and 
secondary central managers. The primary/secondary CMs have two NICs 
each. Each CM has an IP on the private subnet and on my organization's 
public network.
Problem: I tried submitting a job from my workstation (on my 
organization's public network), but the CMs tell it to talk to the 
STARTD at a private address, which obviously doesn't work. I told my 
boss about this and asked how he wanted to proceed, and he wants to only 
allow job submissions from the CMs. This works, BUT he also wants 
failover capability. I don't foresee this working well since the submit 
machine goes down when the CM goes down, even though CM functionality 
fails over.
What kind of options do I have? My boss is adamant that the nodes stay 
on a private subnet and that we have CM failover capability. I don't 
think having a separate submit machine which straddles the private and 
public networks (like the current CMs) will work. My boss wants no 
single point of failure present in the system.
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/