Thank you Edgar, you were right about ALLOW_WRITE, but setting:
ALLOW_DAEMON = $(ALLOW_DAEMON),pilot04@cms didn't work. I had to add Brunel's IPV6 adress explicitly, to the ALLOW_DAEMON to get it to work.
After setting
ALLOW_DAEMON = $(ALLOW_DAEMON),2001:630:10:f001::19a0 in both the CCB and the Collector I started to see glideins connecting back to the collector. I had to do a special setting for HA in order to prevent the negotiator to go fermi Central Manager whenever I set PREFER_IPV4=False.
I have setup one schedd for IPV6 and sent some test jobs. The negotiation process goes well now but now I see a problem with the claim process. In the logs I can see the following:
Schedd [1].
CCB [2]
Startd [3]
What I find strange is that the Startd is trying to connect to the Schedd (188.184.94.50) using ipv4. We couldn't find any reference to the ipv6 address of the schedd within the logs. Any thoughts?
Regards,
Diego
[1]
03/28/17 11:33:34 Timed out requesting claim
glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx <127.0.0.1:21711>#1490692382#1#... for ddavila after REQUEST_CLAIM_TIMEOUT=240 seconds.
03/28/17 11:33:34 Match record (
glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx <127.0.0.1:21711>#1490692382#1#... for ddavila, 171.2) deleted
03/28/17 11:33:34 Canceling request for claim
glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx <127.0.0.1:21711>#1490692382#1#... for ddavila 171.2
03/28/17 11:33:34 SECMAN: resuming command 442 REQUEST_CLAIM to startd
glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx<127.0.0.1:21711>#1490692382#1#... for ddavila from TCP port -1 (non-blocking).
03/28/17 11:33:34 SECMAN: TCP connection to startd
glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx<127.0.0.1:21711>#1490692382#1#... for ddavila failed.
03/28/17 11:33:34 Failed to send REQUEST_CLAIM to startd
glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx<127.0.0.1:21711>#1490692382#1#... for ddavila: SECMAN:2003:TCP connection to startd
glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx <127.0.0.1:21711>#1490692382#1#... for ddavila failed.|CEDAR:6007:operation was canceled
03/28/17 11:33:34 CLOSE TCP <[2001:1458:201:e4::100:62c]:16101> fd=17
[2]
03/28/17 11:33:35 CCB: received request id 19416 from SCHEDD <188.184.94.50:4080?addrs=188.184.94.50-4080+[2001-1458-201-e4--100-62c]-4080&noUDP&sock=23745_4d36_179> on <[2001:1458:201:e4::100:62c]:40045> for target ccbid 17198 (registered as STARTD <127.0.0.1:21711?addrs=[2001-630-10-f001--19a0]-21711+127.0.0.1-21711&noUDP> on <[2001:630:10:f001::19a0]:8346>)
03/28/17 11:33:35 Address rewriting: refused for attribute MyAddress (MyAddress = "<188.184.94.50:4080?addrs=188.184.94.50-4080+[2001-1458-201-e4--100-62c]-4080&noUDP&sock=23745_4d36_179>"): the address isn't my default address. (Default: <188.185.81.179:9644?addrs=[2001-1458-d00-2--100-1ad]-9644+188.185.81.179-9644>, found in ad: <188.184.94.50:4080?addrs=188.
184.94.50-4080+[2001-1458-201-e4--100-62c]-4080&noUDP&sock=23745_4d36_179>)
03/28/17 11:33:35 encrypting secret
03/28/17 11:33:35 condor_write(fd=22 STARTD <127.0.0.1:21711?addrs=[2001-630-10-f001--19a0]-21711+127.0.0.1-21711&noUDP> on <[2001:630:10:f001::19a0]:8346>,,size=408,timeout=1,flags=0,non_blocking=0)
03/28/17 11:34:31 condor_read(fd=22 STARTD <127.0.0.1:21711?addrs=[2001-630-10-f001--19a0]-21711+127.0.0.1-21711&noUDP> on <[2001:630:10:f001::19a0]:8346>,,size=21,timeout=1,flags=0,non_blocking=1)
03/28/17 11:34:31 condor_read(fd=22 STARTD <127.0.0.1:21711?addrs=[2001-630-10-f001--19a0]-21711+127.0.0.1-21711&noUDP> on <[2001:630:10:f001::19a0]:8346>,,size=263,timeout=1,flags=0,non_blocking=1)
03/28/17 11:34:31 encrypting secret
03/28/17 11:34:31 CCB: received error from target daemon STARTD <127.0.0.1:21711?addrs=[2001-630-10-f001--19a0]-21711+127.0.0.1-21711&noUDP> on <[2001:630:10:f001::19a0]:8346> with ccbid 17198 for request 19415 from (client which has gone away): failed to connect
03/28/17 11:34:31 CCB: client for request 19415 to target daemon STARTD <127.0.0.1:21711?addrs=[2001-630-10-f001--19a0]-21711+127.0.0.1-21711&noUDP> on <[2001:630:10:f001::19a0]:8346> with ccbid 17198 disappeared before receiving error details.03/28/17 11:35:02 CollectorAd : Updating ... "< Personal Condor at
vocms0803.cern.ch@xxxxxxxxxxxxxxxxx >"
03/28/17 11:35:02 Trying to update collector <[2001:1458:201:e4::100:535]:9618>
03/28/17 11:35:02 Attempting to send update via UDP to collector
vocms0807.cern.ch <[2001:1458:201:e4::100:535]:9618>
03/28/17 11:35:02 Guess address string for host = <[2001:1458:201:e4::100:535]:9618>, port = 0
03/28/17 11:35:02 it was sinful string. ip = 2001:1458:201:e4::100:535, port = 9618
03/28/17 11:35:02 _condorOutMsg MTU changed from default to 60000
03/28/17 11:35:02 SECMAN: command 19 UPDATE_COLLECTOR_AD to collector
vocms0807.cern.ch:9618 from UDP port 32109 (blocking, raw).
03/28/17 11:35:02 SECMAN: no cached key for {<[2001:1458:201:e4::100:535]:9618>,<19>}.
03/28/17 11:35:02 SECMAN: Security Policy:
[3]
03/28/17 09:44:49 (pid:3625703) attempt to connect to <131.225.205.29:9668> failed: Network is unreachable (connect errno = 101).
03/28/17 09:44:49 (pid:3625703) ERROR: SECMAN:2003:TCP connection to collector
cmssrv215.fnal.gov:9668 failed.
03/28/17 09:44:49 (pid:3625703) Failed to start non-blocking update to <131.225.205.29:9668>.
03/28/17 09:48:26 (pid:3625703) attempt to connect to <188.184.94.50:4080> failed: Network is unreachable (connect errno = 101). Will keep trying for 300 total seconds (300 to go).
03/28/17 09:49:03 (pid:3625703) attempt to connect to <188.184.94.50:4080> failed: Network is unreachable (connect errno = 101).