but I don't think, this is the root of the problem unless this initial start-up saves its state really deep.
HTCondor doesn't record its own address(es) in hard (on-disk) state, but -- my impression is -- a lot of the associated "soft" state can only be reset by restarting the daemon(s).
(1) why is this happening and how do I fix this?
The address `<172.23.33.3:0>` doesn't specify a port number and therefore effectively means "read from the address file on disk." (Which is almost always `$(LOG)/.master_address`.) It's quite possible that address has nothing to do with the correct to contact the master after (the network has come up and?) the proper configuration has been applied; I don't recall when, if at all, it ever gets rewritten.
If a hard restart of _all_ of the daemons doesn't fix this, let us know.
(2) why does it always seem to fall back to the 172 address?
I suspect that you need to restart _the master_ in order to change the address that any of its children use to try to contact it for keep-alives.
(3) why does it get a connection refused? There is no packet filter installed and the daemon is listening on 0.0.0.0:9618
The _shared port daemon_ is listening on that address. I don'trecall how the shared port daemon itself contacts the master -- I think it just does the socket hand-off directly -- but it's not a pure TCP connection.
As to why the master isn't picking up, I don't know. It might say something in the master log; it may be that the shared port daemon's socket hand-off isn't happening because the master's shared port ID changed when the shared port daemon reconfigured.
It sounds like you intend for this installation be rootly; if is, you should never have a reason to specify :0 anywhere, or indeed, any port number at all; HTCondor will assume 9618. This may be a consequence of starting HTCondor before the config management system is done (that is, it may not be in your config at all); if so, that may be another reason to delay start-up until the configuration is complete.
-- ToddM