[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Dual stacked hosts in condor-24.2.2



We've found a change in behavior on our dual-stacked ipv4+ipv6 access points after upgrading from htcondor-23.10.2 to htcondor-24.2.2. This is causing the shared_port service to fail to start.

Downgrading to htcondor-24.2.1 allows it to work again. The release notes did not mention any network-related changes from 24.2.1 to 24.2.2, so this was a surprise to us.



Our access points are dual-homed, but only dual-stacked on the public interface (ignoring the link local address on the private interface):

eno1 (private): 10.13.5.32/16 (no ipv6 link-local address)
eno2 (public): 208.69.128.69/26, 2607:f390:3ff2:16::69/64

Our condor config is configured to prefer the ipv4 addresses:


# condor_config_val -dump | grep -i ipv
ADVERTISE_IPV4_FIRST = $(PREFER_IPV4)
ENABLE_IPV4 = auto
ENABLE_IPV6 = auto
IGNORE_DNS_PROTOCOL_PREFERENCE = $(PREFER_IPV4)
IGNORE_TARGET_PROTOCOL_PREFERENCE = $(PREFER_IPV4)
IP_ADDRESS_IS_IPV6 = false
IPV4_ADDRESS = 10.13.5.32
IPV6_ADDRESS = 2607:f390:3ff2:16::69
PREFER_IPV4 = true
PREFER_OUTBOUND_IPV4 = $(PREFER_IPV4)

# condor_config_val -dump | grep -i network
NETWORK_HOSTNAME =
NETWORK_INTERFACE = eno1
NETWORK_MAX_PENDING_CONNECTS = 0
OPENMPI_EXCLUDE_NETWORK_INTERFACES = docker0,virbr0
PRIVATE_NETWORK_INTERFACE = eno1
PRIVATE_NETWORK_NAME = ldasinternal
VM_NETWORKING = false
VM_NETWORKING_DEFAULT_TYPE =
VM_NETWORKING_MAC_PREFIX =
VM_NETWORKING_TYPE =


The error message in the MasterLog indicates that it can't start the shared_port service:

12/10/24 16:22:21 Starting shared port with port: 9618
12/10/24 16:22:21 Sock::bind failed: errno = 22 Invalid argument
12/10/24 16:22:21 Failed to listen(9618) on TCP/IPv6 command socket. Does this computer have IPv6 support? 12/10/24 16:22:21 Warning: Failed to create IPv6 command socket for ports 9618/9618no UDP 12/10/24 16:22:21 ERROR: Create_Process failed trying to start /usr/libexec/condor/condor_shared_port 12/10/24 16:22:21 restarting /usr/libexec/condor/condor_shared_port in 11 seconds

And very similar messages for condor_credd (not surprising because it makes use of the shared_port service):

12/10/24 16:22:45 ERROR: Create_Process failed trying to start /usr/sbin/condor_credd
12/10/24 16:22:45 restarting /usr/sbin/condor_credd in 17 seconds
12/10/24 16:23:02 Starting shared port with port: 9618
12/10/24 16:23:02 Sock::bind failed: errno = 22 Invalid argument
12/10/24 16:23:02 Failed to listen(9618) on TCP/IPv6 command socket. Does this computer have IPv6 support? 12/10/24 16:23:02 Warning: Failed to create IPv6 command socket for ports 9618/9618no UDP 12/10/24 16:23:02 ERROR: Create_Process failed trying to start /usr/libexec/condor/condor_shared_port 12/10/24 16:23:02 restarting /usr/libexec/condor/condor_shared_port in 25 seconds
12/10/24 16:23:02 Sock::bind failed: errno = 22 Invalid argument
12/10/24 16:23:02 Failed to listen(9620) on TCP/IPv6 command socket. Does this computer have IPv6 support? 12/10/24 16:23:02 Warning: Failed to create IPv6 command socket for ports 9620/9620

--Mike