[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] HAD replication and link-local addresses in 24.0



Hi,

 

The third 24.0 CMs issue to report is that the HAD replication seems broken in our setup. This would indeed represent a problem.

 

The problem seems the one reported at https://opensciencegrid.atlassian.net/browse/HTCONDOR-2453, ie the link-local address of the name being got when resolving the local hostname. At least we do see the link-local address (fe80::...) shown in the logs, as described in the bug report. We don't see this for 23.0 CMs.

 

E.g. in HADLog of sleepybird03:

 

12/13/24 10:29:23 HADStateMachine::initializeHADList my address '<188.184.103.96:51450?addrs=188.184.103.96-51450+[2001-1458-d00-3b--100-30d]-51450&alias=sleepybird03.cern.ch>' vs. address in the list '<[fe80::f816:3eff:fe7f:e18d]:51450>'

 

And in MasterLog:

 

12/13/24 11:21:20 Started DaemonCore process "/usr/sbin/condor_replication", pid and pgroup = 409373

12/13/24 11:21:20 attempt to connect to <[fe80::f816:3eff:fe7f:e18d]:9618> failed: Invalid argument (connect errno = 22).  Will keep trying for 20 total seconds (20 to go).

12/13/24 11:21:40 attempt to connect to <[fe80::f816:3eff:fe7f:e18d]:9618> failed: Invalid argument (connect errno = 22).

12/13/24 11:21:40 ERROR: SECMAN:2003:TCP connection to collector sleepybird03.cern.ch:9618 failed.

 

 

Finally, the terminating error shown in HADLog is:

 

12/13/24 10:29:23 HAD CONFIGURATION ERROR:  my address '<188.184.103.96:51450?addrs=188.184.103.96-51450+[2001-1458-d00-3b--100-30d]-51450&alias=sleepybird03.cern.ch>'is not present in HAD_LIST 'sleepybird01.cern.ch:51450, sleepybird02.cern.ch:51450, sleepybird03.cern.ch:51450'

12/13/24 10:29:23 main_shutdown_graceful

 

Shortly after, the condor_had process exists, so it cannot get messages from other CMs.

 

I see the bug was fixed for 23.9.6, but was not fixed for 24.0? Maybe there might be a workaround for that (e.g. playing with nsswitch.conf)?

 

Thanks a lot.

 

Cheers,

   Antonio