HI all, I am trying to figure out why our publicly reachable schedd nodes are consistently picking the wrong network interface. According to the docs, not setting anything should have the daemons pick the interface used for talking to the collector: NETWORK_INTERFACE [...] If multiple network interfaces match the value and ENABLE_ADDRESS_REWRITING is True (the default), the IP address that is chosen to be advertised will be the one that is used to communicate with the condor_collector. [...] All daemons (Master, Schedd, SharedPort) pick the internal address of our collector [1]. However, on startup they already pick their external address for themselves [2]. The address is never updated to the internal one. SharedPort regularly updates its statistics, but sticks to its address [3]. For now, I had to manually force the internal address via NETWORK_INTERFACE=10.* Cheers, Max [1] /var/log/condor/SharedPortLog 02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) COLLECTOR_HOST is set to "lrms-htcondor-1-kit.gridka.de" 02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) Checking if lrms-htcondor-1-kit.gridka.de is a sinful address 02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) lrms-htcondor-1-kit.gridka.de is not a sinful address: does not begin with "<" 02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) New Daemon obj (collector) name: "lrms-htcondor-1-kit.gridka.de", pool: "NULL", addr: "NULL" 02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) Using name "lrms-htcondor-1-kit.gridka.de" to find daemon 02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) Port not specified, using default (9618) 02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) Host info "lrms-htcondor-1-kit.gridka.de" is a hostname, finding IP address 02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) DNS returned: 02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) 10.97.13.108 02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) We returned: 02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) 10.97.13.108 02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) Found IP address and port <10.97.13.108:9618> [2] /var/log/condor/SharedPortLog 02/14/17 14:20:37 (pid:12162) (D_HOSTNAME) NETWORK_INTERFACE=* matches lo 127.0.0.1, eth0 192.108.45.30, eth1 10.33.1.130, lo ::1, eth0 fe80::217:8ff:fe50:d732, eth1 fe80::217:8ff:fe50:d731, ch oosing IP 192.108.45.30 02/14/17 14:20:37 (pid:12162) (D_HOSTNAME) DNS returned: 02/14/17 14:20:37 (pid:12162) (D_HOSTNAME) 10.33.1.130 02/14/17 14:20:37 (pid:12162) (D_HOSTNAME) 192.108.45.30 02/14/17 14:20:37 (pid:12162) (D_HOSTNAME) We returned: 02/14/17 14:20:37 (pid:12162) (D_HOSTNAME) 10.33.1.130 02/14/17 14:20:37 (pid:12162) (D_HOSTNAME) 192.108.45.30 02/14/17 14:20:37 (pid:12162) (D_HOSTNAME) I like it. 02/14/17 14:20:37 (pid:12162) (D_HOSTNAME) hostname: gridka30 (score 4) new winner 02/14/17 14:20:37 (pid:12162) (D_HOSTNAME) I am: hostname: gridka30, fully qualified doman name: gridka30, IP: 192.108.45.30, IPv4: 192.108.45.30, IPv6: ::1 [3] /var/log/condor/SharedPortLog 02/14/17 14:53:18 (pid:12498) (D_ALWAYS) About to update statistics in shared_port daemon ad file at /var/lock/condor/shared_port_ad : ForkedChildrenPeak = 0 RequestsBlocked = 34 RequestsPendingCurrent = 0 MyAddress = "<192.108.45.30:9618?addrs=192.108.45.30-9618+[--1]-9618&noUDP>" RequestsPendingPeak = 1 RequestsFailed = 34 SharedPortCommandSinfuls = "<192.108.45.30:9618>,<[::1]:9618>" ForkedChildrenCurrent = 0 RequestsSucceeded = 6 [4]$ condor_config_val IP_ADDRESS ENABLE_ADDRESS_REWRITING NETWORK_INTERFACE BIND_ALL_INTERFACES 192.108.45.30 true * true
Attachment:
smime.p7s
Description: S/MIME cryptographic signature