Re: [HTCondor-devel] Fwd: [Osg-gfactory-support] About IPv6 tests in ITB pool


Date: Tue, 28 Mar 2017 17:55:40 +0000
From: Zach Miller <zmiller@xxxxxxxxxxx>
Subject: Re: [HTCondor-devel] Fwd: [Osg-gfactory-support] About IPv6 tests in ITB pool
Huh.  Although I am familiar with the security side of things, I have to admit I have no experience with IPv6.  I will need to investigate, probably with Todd Miller's help.  Thanks for the report and I will get back to you.


Cheers,
-zach


> -----Original Message-----
> From: HTCondor-devel [mailto:htcondor-devel-bounces@xxxxxxxxxxx] On Behalf
> Of Brian Bockelman
> Sent: Tuesday, March 28, 2017 12:44 PM
> To: Condor Developers <htcondor-devel@xxxxxxxxxxx>
> Subject: [HTCondor-devel] Fwd: [Osg-gfactory-support] About IPv6 tests in
> ITB pool
> 
> Hi HTCondor folk,
> 
> The claim from the CMS pilot operators is that the following does not match
> IPv6 addresses:
> 
> ALLOW_DAEMON=*
> 
> (They've had to explicitly list each worker node's IP address to move
> forward in testing...)
> 
> Can someone confirm / deny that fact?
> 
> Additionally, can someone look at the CCB log [2] below?  Seems the
> connection reversing of the startd back to schedd is attempting to go over
> v4, despite this being a V6-only host.  MyAddress as sent by the CCB
> contains both V4 and V6; V4 appears to be selected.  Thoughts?
> 
> Thanks,
> 
> Brian
> 
> 
> 
> 	Begin forwarded message:
> 
> 	From: Diego Davila Foyo <diego.davila@xxxxxxx
> <mailto:diego.davila@xxxxxxx> >
> 
> 	Subject: RE: [Osg-gfactory-support] About IPv6 tests in ITB pool
> 
> 	Date: March 28, 2017 at 7:30:24 AM CDT
> 
> 	To: Edgar M Fajardo Hernandez <emfajardohernandez@xxxxxxxxxxxxxxxx
> <mailto:emfajardohernandez@xxxxxxxxxxxxxxxx> >
> 
> 	Cc: Jeffrey Michael Dost <jdost@xxxxxxxx <mailto:jdost@xxxxxxxx> >,
> "bbockelm@xxxxxxxxxxx <mailto:bbockelm@xxxxxxxxxxx> " <bbockelm@xxxxxxxxxxx
> <mailto:bbockelm@xxxxxxxxxxx> >, Marian Zvada <Marian.Zvada@xxxxxxx
> <mailto:Marian.Zvada@xxxxxxx> >, "Farrukh Aftab Khan"
> <farrukh.aftab.khan@xxxxxxx <mailto:farrukh.aftab.khan@xxxxxxx> >,
> "emfajard@xxxxxxxx <mailto:emfajard@xxxxxxxx> " <emfajard@xxxxxxxx
> <mailto:emfajard@xxxxxxxx> >, "osg-gfactory-support@xxxxxxxxxxxxxxxx
> <mailto:osg-gfactory-support@xxxxxxxxxxxxxxxx> " <osg-gfactory-
> support@xxxxxxxxxxxxxxxx <mailto:osg-gfactory-support@xxxxxxxxxxxxxxxx> >,
> Todor Trendafilov Ivanov <todor.trendafilov.ivanov@xxxxxxx
> <mailto:todor.trendafilov.ivanov@xxxxxxx> >, Andrea Sciaba
> <Andrea.Sciaba@xxxxxxx <mailto:Andrea.Sciaba@xxxxxxx> >, Duncan Rand
> <duncan.rand@xxxxxxxxxxxxxx <mailto:duncan.rand@xxxxxxxxxxxxxx> >, Marco
> Mascheroni <marco.mascheroni@xxxxxxx <mailto:marco.mascheroni@xxxxxxx> >,
> Raul Cardoso Lopes <raul.cardoso.lopes@xxxxxxx
> <mailto:raul.cardoso.lopes@xxxxxxx> >
> 
> 
> 	Thank you Edgar, you were right about ALLOW_WRITE, but setting:
> ALLOW_DAEMON = $(ALLOW_DAEMON),pilot04@cms  didn't work. I had to  add
> Brunel's IPV6 adress explicitly, to the ALLOW_DAEMON to get it to work.
> 
> 	After setting ALLOW_DAEMON = $(ALLOW_DAEMON),2001:630:10:f001::19a0
> in both the CCB and the Collector I started to see glideins connecting back
> to the collector. I had to do a special setting for HA in order to prevent
> the negotiator to go fermi Central Manager whenever I set
> PREFER_IPV4=False.
> 
> 
> 	I have setup one schedd for IPV6 and sent some test jobs. The
> negotiation process goes well now but now I see a problem with the claim
> process. In the logs I can see the following:
> 	Schedd [1].
> 	CCB [2]
> 	Startd [3]
> 
> 	What I find strange is that the Startd is trying to connect to the
> Schedd (188.184.94.50) using ipv4. We couldn't find any reference to the
> ipv6 address of the schedd within the logs. Any thoughts?
> 
> 	Regards,
> 
> 	Diego
> 
> 
> 
> 
> 	[1]
> 	03/28/17 11:33:34 Timed out requesting claim
> glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx
> <mailto:glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx>
> <127.0.0.1:21711>#1490692382#1#... for ddavila after
> REQUEST_CLAIM_TIMEOUT=240 seconds.
> 	03/28/17 11:33:34 Match record (glidein_3722464_389738448@wn-a3-18-
> 00.brunel.ac.uk <mailto:glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx>
> <127.0.0.1:21711>#1490692382#1#... for ddavila, 171.2) deleted
> 	03/28/17 11:33:34 Canceling request for claim
> glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx
> <mailto:glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx>
> <127.0.0.1:21711>#1490692382#1#... for ddavila 171.2
> 	03/28/17 11:33:34 SECMAN: resuming command 442 REQUEST_CLAIM to
> startd glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx
> <mailto:glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx>
> <127.0.0.1:21711>#1490692382#1#... for ddavila from TCP port -1 (non-
> blocking).
> 	03/28/17 11:33:34 SECMAN: TCP connection to startd
> glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx
> <mailto:glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx>
> <127.0.0.1:21711>#1490692382#1#... for ddavila failed.
> 	03/28/17 11:33:34 Failed to send REQUEST_CLAIM to startd
> glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx
> <mailto:glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx>
> <127.0.0.1:21711>#1490692382#1#... for ddavila: SECMAN:2003:TCP connection
> to startd glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx
> <mailto:glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx>
> <127.0.0.1:21711>#1490692382#1#... for ddavila failed.|CEDAR:6007:operation
> was canceled
> 	03/28/17 11:33:34 CLOSE TCP <[2001:1458:201:e4::100:62c]:16101>
> fd=17
> 
> 	[2]
> 	03/28/17 11:33:35 CCB: received request id 19416 from SCHEDD
> <188.184.94.50:4080?addrs=188.184.94.50-4080+[2001-1458-201-e4--100-62c]-
> 4080&noUDP&sock=23745_4d36_179> on <[2001:1458:201:e4::100:62c]:40045> for
> target ccbid 17198 (registered as STARTD <127.0.0.1:21711?addrs=[2001-630-
> 10-f001--19a0]-21711+127.0.0.1-21711&noUDP> on
> <[2001:630:10:f001::19a0]:8346>)
> 	03/28/17 11:33:35 Address rewriting: refused for attribute MyAddress
> (MyAddress = "<188.184.94.50:4080?addrs=188.184.94.50-4080+[2001-1458-201-
> e4--100-62c]-4080&noUDP&sock=23745_4d36_179>"): the address isn't my
> default address. (Default: <188.185.81.179:9644?addrs=[2001-1458-d00-2--
> 100-1ad]-9644+188.185.81.179-9644>, found in ad:
> <188.184.94.50:4080?addrs=188.
> 	184.94.50-4080+[2001-1458-201-e4--100-62c]-
> 4080&noUDP&sock=23745_4d36_179>)
> 	03/28/17 11:33:35 encrypting secret
> 	03/28/17 11:33:35 condor_write(fd=22 STARTD
> <127.0.0.1:21711?addrs=[2001-630-10-f001--19a0]-21711+127.0.0.1-
> 21711&noUDP> on
> <[2001:630:10:f001::19a0]:8346>,,size=408,timeout=1,flags=0,non_blocking=0)
> 	03/28/17 11:34:31 condor_read(fd=22 STARTD
> <127.0.0.1:21711?addrs=[2001-630-10-f001--19a0]-21711+127.0.0.1-
> 21711&noUDP> on
> <[2001:630:10:f001::19a0]:8346>,,size=21,timeout=1,flags=0,non_blocking=1)
> 	03/28/17 11:34:31 condor_read(fd=22 STARTD
> <127.0.0.1:21711?addrs=[2001-630-10-f001--19a0]-21711+127.0.0.1-
> 21711&noUDP> on
> <[2001:630:10:f001::19a0]:8346>,,size=263,timeout=1,flags=0,non_blocking=1)
> 	03/28/17 11:34:31 encrypting secret
> 	03/28/17 11:34:31 CCB: received error from target daemon STARTD
> <127.0.0.1:21711?addrs=[2001-630-10-f001--19a0]-21711+127.0.0.1-
> 21711&noUDP> on <[2001:630:10:f001::19a0]:8346> with ccbid 17198 for
> request 19415 from (client which has gone away): failed to connect
> 	03/28/17 11:34:31 CCB: client for request 19415 to target daemon
> STARTD <127.0.0.1:21711?addrs=[2001-630-10-f001--19a0]-21711+127.0.0.1-
> 21711&noUDP> on <[2001:630:10:f001::19a0]:8346> with ccbid 17198
> disappeared before receiving error details.
> 	03/28/17 11:35:02 CollectorAd  : Updating ... "< Personal Condor at
> vocms0803.cern.ch@xxxxxxxxxxxxxxxxx
> <mailto:vocms0803.cern.ch@xxxxxxxxxxxxxxxxx>  >"
> 	03/28/17 11:35:02 Trying to update collector
> <[2001:1458:201:e4::100:535]:9618>
> 	03/28/17 11:35:02 Attempting to send update via UDP to collector
> vocms0807.cern.ch <http://vocms0807.cern.ch/>
> <[2001:1458:201:e4::100:535]:9618>
> 	03/28/17 11:35:02 Guess address string for host =
> <[2001:1458:201:e4::100:535]:9618>, port = 0
> 	03/28/17 11:35:02 it was sinful string. ip =
> 2001:1458:201:e4::100:535, port = 9618
> 	03/28/17 11:35:02 _condorOutMsg MTU changed from default to 60000
> 	03/28/17 11:35:02 SECMAN: command 19 UPDATE_COLLECTOR_AD to
> collector vocms0807.cern.ch:9618 <http://vocms0807.cern.ch:9618/>  from UDP
> port 32109 (blocking, raw).
> 	03/28/17 11:35:02 SECMAN: no cached key for
> {<[2001:1458:201:e4::100:535]:9618>,<19>}.
> 	03/28/17 11:35:02 SECMAN: Security Policy:
> 
> 
> 	[3]
> 	03/28/17 09:44:49 (pid:3625703) attempt to connect to
> <131.225.205.29:9668> failed: Network is unreachable (connect errno = 101).
> 	03/28/17 09:44:49 (pid:3625703) ERROR: SECMAN:2003:TCP connection to
> collector cmssrv215.fnal.gov:9668 <http://cmssrv215.fnal.gov:9668/>
> failed.
> 	03/28/17 09:44:49 (pid:3625703) Failed to start non-blocking update
> to <131.225.205.29:9668>.
> 	03/28/17 09:48:26 (pid:3625703) attempt to connect to
> <188.184.94.50:4080> failed: Network is unreachable (connect errno = 101).
> Will keep trying for 300 total seconds (300 to go).
> 
> 	03/28/17 09:49:03 (pid:3625703) attempt to connect to
> <188.184.94.50:4080> failed: Network is unreachable (connect errno = 101).
> 

[← Prev in Thread] Current Thread [Next in Thread→]