HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] RFC: GCB optimization for local network communication




On Dec 12, 2006, at 6:53 PM, Miron Livny wrote:

So, let me make sure that I understand what is going on - A "network address" in Condor will consist of three elements - IP/ port, NETWORK_NAME, and GCB IP/port.

yes. however, to be clear, i think we should refer to these as the "public IP/port", "network name", and "local IP/port"...

The NETWORK_NAME will be used only when the address has a non-null GCB element, right?

basically, yeah, but it's this "GCB element" that is confusing the discussion. all daemons will always have a public IP/port. optionally, some daemons will have a network_name and local IP/port.

In other words, if a NETWORK_NAME is not provided, Condor will assume that the IP/port is in a "GLOBAL NETWORK NAME SPACE".

yes.

We also assume that the GCB is in this GLOBAL space.

not sure what you mean, but i think this is "we assume the GCB broker is in the global space", which is true.

If a GCB is not included, should Condor check that both parties are in the same NETWORK_SPACE before attempting to establish a connection.

no. if there's no network_name, we always use the public IP/port all the time. if there is a network_name, we consider the network_name of our peer (if any), and if they match, we use the local IP/port, not the public IP/port. we never ask the GCB broker if we should use the public or local IP. however, sometimes when we use the public IP, it's really going to be a GCB broker.

Can we envision cases where NETWORK_NAME will be used in a requirement expression?

sure, and/or Rank exprs. that's a good reason to actually use 3 attributes for these 3 elements of the network id. the counter argument is that having all 3 elements in 1 attribute makes it easier to pass around a single string that gives you everything you need to know to contact a Condor entity. alan proposed we make the new attr have all 3 elements in 1 giant syntax, and introduce that as the single new attr. we change condor more and more to use this if it exists, and fall back to the deprecated "StartdIpAddr" and friends if the "NetworkId" isn't defined. this gets us 1 step closer to "always just use the new way", even while maintaining backwards compatibility for the short term.

perhaps ClassAd functions could be used to match against just the network_name from inside a larger NetworkId string. we'd assume 6.9.x for the existence of this attr, anyway, and we'd just say "if you have an older pool and you're trying to do fancy rank/requirement to match the network name, you must upgrade..."

but, it's also very tempting to just keep all 3 separate attrs to make it easy for anyone's condor to match against the network name directly, without sneaking classad function tricks.

final complication: a given multihomed machine might have N public and M local IP addrs. perhaps we really need a list of both, a nested classad, etc. we could punt on this for now and continue to assume a single public and private/local addr, and move to N addresses once we're in 7.0 and have new classads everywhere.

Why not just to add an attribute for NETWORK_NAME and let the Condor software do the rest internally?

b/c daemons have to advertise their local IP/port for anyone else to contact them using this optimization. if you want to avoid the GCB broker (the whole point of this exercise), you have to know the real local IP/port somehow, so it must be present in the ClassAds (either as another attr or as part of a bigger NetworkId).

I believe that our mmain GCB users at this point are doing it with Glide-ins, right?

seems that way, but a) those might just be the most vocal users and b) the CS pool is going to have to become GCB'inized at some point in the medium term, which is a big part of what's driving all the GCB development.

continue to be compatible, and would work... it's just a question of
if a given connection could use this optimization to skip talking to
the GCB broker or not.

Should this be the question or should we ask whether a connection requires a GCB in order to be established?

sorry, confusing use of language on my part... it's not a question at all. ;) i just meant that the presence of this info (either as NetworkId or as NetworkName and LocalIpAddr) determines if we try to use the optimization to skip talking to GCB or not. the "question" is if the attr(s) are there. if so, we use it, else, we fall back to the PublicIpAddr.

I vote (b).

i knew you would. ;) but, it's good to get all this hashed out and clear before i dive into the code.

thanks,
-derek