Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Dual stacked hosts in condor-24.2.2
- Date: Tue, 17 Dec 2024 14:42:02 +0000
- From: Jaime Frey <jfrey@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Dual stacked hosts in condor-24.2.2
It looks like in versions 24.1.x and before, the value of NETWORK_INTERFACE is ignored for IPv6. The daemons always bind to and advertise the most public IPv6 address available on the system. In 24.2.x, NETWORK_INTERFACE is respected for IPv6, but the daemons mistakenly believe that if thereâs any IPv6 address on the system, then one is available for binding on the specified interface. This leads to the failure youâre seeing.
We will work in fixing that.
- Jaime
> On Dec 16, 2024, at 4:20âPM, Jaime Frey via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
>
> The IPv6 networking changes for Windows did affect code common to all platforms.
>
> What is BIND_ALL_INTERFACES set to? If itâs set to false, meaning you want to ignore the public interface completely, then you can set ENABLE_IPV6=False so that HTCondor ignores IPv6 completely.
>
> - Jaime
>
>> On Dec 10, 2024, at 4:36âPM, Michael Thomas <wart@xxxxxxxxxxx> wrote:
>>
>> Minor correction: Downgrading from 24.2.2 to 24.1.1 allows it to work again (I wasn't looking close enough at the version string).
>>
>> I wonder if the 'IPv6 networking is now fully supported on Windows' introduced some different behavior in the RL8 packages?
>>
>> --Mike
>>
>> On 12/10/24 16:32, Michael Thomas wrote:
>>> We've found a change in behavior on our dual-stacked ipv4+ipv6 access points after upgrading from htcondor-23.10.2 to htcondor-24.2.2. This is causing the shared_port service to fail to start.
>>> Downgrading to htcondor-24.2.1 allows it to work again. The release notes did not mention any network-related changes from 24.2.1 to 24.2.2, so this was a surprise to us.
>>> Our access points are dual-homed, but only dual-stacked on the public interface (ignoring the link local address on the private interface):
>>> eno1 (private): 10.13.5.32/16 (no ipv6 link-local address)
>>> eno2 (public): 208.69.128.69/26, 2607:f390:3ff2:16::69/64
>>> Our condor config is configured to prefer the ipv4 addresses:
>>> # condor_config_val -dump | grep -i ipv
>>> ADVERTISE_IPV4_FIRST = $(PREFER_IPV4)
>>> ENABLE_IPV4 = auto
>>> ENABLE_IPV6 = auto
>>> IGNORE_DNS_PROTOCOL_PREFERENCE = $(PREFER_IPV4)
>>> IGNORE_TARGET_PROTOCOL_PREFERENCE = $(PREFER_IPV4)
>>> IP_ADDRESS_IS_IPV6 = false
>>> IPV4_ADDRESS = 10.13.5.32
>>> IPV6_ADDRESS = 2607:f390:3ff2:16::69
>>> PREFER_IPV4 = true
>>> PREFER_OUTBOUND_IPV4 = $(PREFER_IPV4)
>>> # condor_config_val -dump | grep -i network
>>> NETWORK_HOSTNAME =
>>> NETWORK_INTERFACE = eno1
>>> NETWORK_MAX_PENDING_CONNECTS = 0
>>> OPENMPI_EXCLUDE_NETWORK_INTERFACES = docker0,virbr0
>>> PRIVATE_NETWORK_INTERFACE = eno1
>>> PRIVATE_NETWORK_NAME = ldasinternal
>>> VM_NETWORKING = false
>>> VM_NETWORKING_DEFAULT_TYPE =
>>> VM_NETWORKING_MAC_PREFIX =
>>> VM_NETWORKING_TYPE =
>>> The error message in the MasterLog indicates that it can't start the shared_port service:
>>> 12/10/24 16:22:21 Starting shared port with port: 9618
>>> 12/10/24 16:22:21 Sock::bind failed: errno = 22 Invalid argument
>>> 12/10/24 16:22:21 Failed to listen(9618) on TCP/IPv6 command socket. Does this computer have IPv6 support?
>>> 12/10/24 16:22:21 Warning: Failed to create IPv6 command socket for ports 9618/9618no UDP
>>> 12/10/24 16:22:21 ERROR: Create_Process failed trying to start /usr/ libexec/condor/condor_shared_port
>>> 12/10/24 16:22:21 restarting /usr/libexec/condor/condor_shared_port in 11 seconds
>>> And very similar messages for condor_credd (not surprising because it makes use of the shared_port service):
>>> 12/10/24 16:22:45 ERROR: Create_Process failed trying to start /usr/ sbin/condor_credd
>>> 12/10/24 16:22:45 restarting /usr/sbin/condor_credd in 17 seconds
>>> 12/10/24 16:23:02 Starting shared port with port: 9618
>>> 12/10/24 16:23:02 Sock::bind failed: errno = 22 Invalid argument
>>> 12/10/24 16:23:02 Failed to listen(9618) on TCP/IPv6 command socket. Does this computer have IPv6 support?
>>> 12/10/24 16:23:02 Warning: Failed to create IPv6 command socket for ports 9618/9618no UDP
>>> 12/10/24 16:23:02 ERROR: Create_Process failed trying to start /usr/ libexec/condor/condor_shared_port
>>> 12/10/24 16:23:02 restarting /usr/libexec/condor/condor_shared_port in 25 seconds
>>> 12/10/24 16:23:02 Sock::bind failed: errno = 22 Invalid argument
>>> 12/10/24 16:23:02 Failed to listen(9620) on TCP/IPv6 command socket. Does this computer have IPv6 support?
>>> 12/10/24 16:23:02 Warning: Failed to create IPv6 command socket for ports 9620/9620
>>> --Mike
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> The archives can be found at: https://www-auth.cs.wisc.edu/lists/ htcondor-users/
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>>
>> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
>
> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/