Dear all,
I have news about our issue with the condor_collector. We updated HTCondor to the stable version 8.6.4 in our CE. Unfortunately, the error persisted at the beginning: the condor_collector was crashing again after a restart with CAs 1.8.4 and we had to kill all running jobs to ensure that the condor_collector daemon was running again without crashes.
I've been checking Yutaro Iiyama issue and it seems quite similar but affecting to condor_schedd daemon (our error is attached). We are running a dual-stack pool with just one only-IPv6 WN, so, the general configuration of our condor pool is:
ENABLE_IPV4 = auto
ENABLE_IPV6 = auto
PREFER_IPV4 = true
I've added the same lines in the condor-ce configuration. I've tried few condor-ce restart and now it seems stable, but I'm seeing several messages like the ones before the crash:
Failed to send DC_INVALIDATE_KEY to daemon at <
IPV4:3196>: SECMAN:2003:TCP connection to daemon at <IPV4:31986> failed.
DC_AUTHENTICATE: attempt to open invalid session ce13:1272309:1501056815:3, failing; this session was requested by <IPV4:27682> with return address <IPV4?addrs=IPV4-21917+[IPV6]-
21917&alias=name>
I will report any other problem related to this issue in the future.
Thank you very much.
Cheers,
Carles