Re: [HTCondor-devel] Fwd: [Osg-gfactory-support] About IPv6 tests in ITB pool


Date: Tue, 28 Mar 2017 18:08:26 +0000
From: Zach Miller <zmiller@xxxxxxxxxxx>
Subject: Re: [HTCondor-devel] Fwd: [Osg-gfactory-support] About IPv6 tests in ITB pool
Okay, thanks.  I'll wait to hear from you before I look into it further.


Cheers,
-zach


> -----Original Message-----
> From: Brian Bockelman [mailto:bbockelm@xxxxxxxxxxx]
> Sent: Tuesday, March 28, 2017 1:05 PM
> To: Zach Miller <zmiller@xxxxxxxxxxx>
> Cc: Condor Developers <htcondor-devel@xxxxxxxxxxx>
> Subject: Re: [HTCondor-devel] Fwd: [Osg-gfactory-support] About IPv6 tests
> in ITB pool
> 
> Actually, hold on there ...
> 
> No one is able to confirm (yet) that they actually upgraded the
> condor_startd version to one that supports IPv6 as was suggested (grumble
> grumble).  Let me get precise versions of all involved components (CCB,
> schedd, startd) to avoid setting you off on a goose chase (domestic or
> otherwise).
> 
> 
> Brian
> 
> 
> 	On Mar 28, 2017, at 12:55 PM, Zach Miller <zmiller@xxxxxxxxxxx
> <mailto:zmiller@xxxxxxxxxxx> > wrote:
> 
> 	Huh.  Although I am familiar with the security side of things, I
> have to admit I have no experience with IPv6.  I will need to investigate,
> probably with Todd Miller's help.  Thanks for the report and I will get
> back to you.
> 
> 
> 	Cheers,
> 	-zach
> 
> 
> 
> 
> 		-----Original Message-----
> 		From: HTCondor-devel [mailto:htcondor-devel-
> bounces@xxxxxxxxxxx] On Behalf
> 		Of Brian Bockelman
> 		Sent: Tuesday, March 28, 2017 12:44 PM
> 		To: Condor Developers <htcondor-devel@xxxxxxxxxxx
> <mailto:htcondor-devel@xxxxxxxxxxx> >
> 		Subject: [HTCondor-devel] Fwd: [Osg-gfactory-support] About
> IPv6 tests in
> 		ITB pool
> 
> 		Hi HTCondor folk,
> 
> 		The claim from the CMS pilot operators is that the following
> does not match
> 		IPv6 addresses:
> 
> 		ALLOW_DAEMON=*
> 
> 		(They've had to explicitly list each worker node's IP address
> to move
> 		forward in testing...)
> 
> 		Can someone confirm / deny that fact?
> 
> 		Additionally, can someone look at the CCB log [2] below?
> Seems the
> 		connection reversing of the startd back to schedd is
> attempting to go over
> 		v4, despite this being a V6-only host.  MyAddress as sent by
> the CCB
> 		contains both V4 and V6; V4 appears to be selected.
> Thoughts?
> 
> 		Thanks,
> 
> 		Brian
> 
> 
> 
> 		Begin forwarded message:
> 
> 		From: Diego Davila Foyo <diego.davila@xxxxxxx
> <mailto:diego.davila@xxxxxxx>
> 		<mailto:diego.davila@xxxxxxx> >
> 
> 		Subject: RE: [Osg-gfactory-support] About IPv6 tests in ITB
> pool
> 
> 		Date: March 28, 2017 at 7:30:24 AM CDT
> 
> 		To: Edgar M Fajardo Hernandez
> <emfajardohernandez@xxxxxxxxxxxxxxxx
> <mailto:emfajardohernandez@xxxxxxxxxxxxxxxx>
> 		<mailto:emfajardohernandez@xxxxxxxxxxxxxxxx> >
> 
> 		Cc: Jeffrey Michael Dost <jdost@xxxxxxxx
> <mailto:jdost@xxxxxxxx>  <mailto:jdost@xxxxxxxx> >,
> 		"bbockelm@xxxxxxxxxxx <mailto:bbockelm@xxxxxxxxxxx>
> <mailto:bbockelm@xxxxxxxxxxx> " <bbockelm@xxxxxxxxxxx
> <mailto:bbockelm@xxxxxxxxxxx>
> 		<mailto:bbockelm@xxxxxxxxxxx> >, Marian Zvada
> <Marian.Zvada@xxxxxxx <mailto:Marian.Zvada@xxxxxxx>
> 		<mailto:Marian.Zvada@xxxxxxx> >, "Farrukh Aftab Khan"
> 		<farrukh.aftab.khan@xxxxxxx
> <mailto:farrukh.aftab.khan@xxxxxxx>  <mailto:farrukh.aftab.khan@xxxxxxx> >,
> 		"emfajard@xxxxxxxx <mailto:emfajard@xxxxxxxx>
> <mailto:emfajard@xxxxxxxx> " <emfajard@xxxxxxxx <mailto:emfajard@xxxxxxxx>
> 		<mailto:emfajard@xxxxxxxx> >, "osg-gfactory-
> support@xxxxxxxxxxxxxxxx <mailto:osg-gfactory-support@xxxxxxxxxxxxxxxx>
> 		<mailto:osg-gfactory-support@xxxxxxxxxxxxxxxx> " <osg-
> gfactory-
> 		support@xxxxxxxxxxxxxxxx <mailto:support@xxxxxxxxxxxxxxxx>
> <mailto:osg-gfactory-support@xxxxxxxxxxxxxxxx> >,
> 		Todor Trendafilov Ivanov <todor.trendafilov.ivanov@xxxxxxx
> <mailto:todor.trendafilov.ivanov@xxxxxxx>
> 		<mailto:todor.trendafilov.ivanov@xxxxxxx> >, Andrea Sciaba
> 		<Andrea.Sciaba@xxxxxxx <mailto:Andrea.Sciaba@xxxxxxx>
> <mailto:Andrea.Sciaba@xxxxxxx> >, Duncan Rand
> 		<duncan.rand@xxxxxxxxxxxxxx
> <mailto:duncan.rand@xxxxxxxxxxxxxx>  <mailto:duncan.rand@xxxxxxxxxxxxxx> >,
> Marco
> 		Mascheroni <marco.mascheroni@xxxxxxx
> <mailto:marco.mascheroni@xxxxxxx>  <mailto:marco.mascheroni@xxxxxxx> >,
> 		Raul Cardoso Lopes <raul.cardoso.lopes@xxxxxxx
> <mailto:raul.cardoso.lopes@xxxxxxx>
> 		<mailto:raul.cardoso.lopes@xxxxxxx> >
> 
> 
> 		Thank you Edgar, you were right about ALLOW_WRITE, but
> setting:
> 		ALLOW_DAEMON = $(ALLOW_DAEMON),pilot04@cms  didn't work. I
> had to  add
> 		Brunel's IPV6 adress explicitly, to the ALLOW_DAEMON to get
> it to work.
> 
> 		After setting ALLOW_DAEMON =
> $(ALLOW_DAEMON),2001:630:10:f001::19a0
> 		in both the CCB and the Collector I started to see glideins
> connecting back
> 		to the collector. I had to do a special setting for HA in
> order to prevent
> 		the negotiator to go fermi Central Manager whenever I set
> 		PREFER_IPV4=False.
> 
> 
> 		I have setup one schedd for IPV6 and sent some test jobs. The
> 		negotiation process goes well now but now I see a problem
> with the claim
> 		process. In the logs I can see the following:
> 		Schedd [1].
> 		CCB [2]
> 		Startd [3]
> 
> 		What I find strange is that the Startd is trying to connect
> to the
> 		Schedd (188.184.94.50) using ipv4. We couldn't find any
> reference to the
> 		ipv6 address of the schedd within the logs. Any thoughts?
> 
> 		Regards,
> 
> 		Diego
> 
> 
> 
> 
> 		[1]
> 		03/28/17 11:33:34 Timed out requesting claim
> 		glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx
> <mailto:glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx>
> 		<mailto:glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx>
> 		<127.0.0.1:21711>#1490692382#1#... for ddavila after
> 		REQUEST_CLAIM_TIMEOUT=240 seconds.
> 		03/28/17 11:33:34 Match record (glidein_3722464_389738448@wn-
> a3-18-
> 		00.brunel.ac.uk <http://00.brunel.ac.uk/>
> <mailto:glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx>
> 		<127.0.0.1:21711>#1490692382#1#... for ddavila, 171.2)
> deleted
> 		03/28/17 11:33:34 Canceling request for claim
> 		glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx
> <mailto:glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx>
> 		<mailto:glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx>
> 		<127.0.0.1:21711>#1490692382#1#... for ddavila 171.2
> 		03/28/17 11:33:34 SECMAN: resuming command 442 REQUEST_CLAIM
> to
> 		startd glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx
> <mailto:glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx>
> 		<mailto:glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx>
> 		<127.0.0.1:21711>#1490692382#1#... for ddavila from TCP port
> -1 (non-
> 		blocking).
> 		03/28/17 11:33:34 SECMAN: TCP connection to startd
> 		glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx
> <mailto:glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx>
> 		<mailto:glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx>
> 		<127.0.0.1:21711>#1490692382#1#... for ddavila failed.
> 		03/28/17 11:33:34 Failed to send REQUEST_CLAIM to startd
> 		glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx
> <mailto:glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx>
> 		<mailto:glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx>
> 		<127.0.0.1:21711>#1490692382#1#... for ddavila:
> SECMAN:2003:TCP connection
> 		to startd glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx
> <mailto:glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx>
> 		<mailto:glidein_3722464_389738448@xxxxxxxxxxxxxxxxxxxxxxxx>
> 		<127.0.0.1:21711>#1490692382#1#... for ddavila
> failed.|CEDAR:6007:operation
> 		was canceled
> 		03/28/17 11:33:34 CLOSE TCP
> <[2001:1458:201:e4::100:62c]:16101>
> 		fd=17
> 
> 		[2]
> 		03/28/17 11:33:35 CCB: received request id 19416 from SCHEDD
> 		<188.184.94.50:4080?addrs=188.184.94.50-4080+[2001-1458-201-
> e4--100-62c]-
> 		4080&noUDP&sock=23745_4d36_179> on
> <[2001:1458:201:e4::100:62c]:40045> for
> 		target ccbid 17198 (registered as STARTD
> <127.0.0.1:21711?addrs=[2001-630-
> 		10-f001--19a0]-21711+127.0.0.1-21711&noUDP> on
> 		<[2001:630:10:f001::19a0]:8346>)
> 		03/28/17 11:33:35 Address rewriting: refused for attribute
> MyAddress
> 		(MyAddress = "<188.184.94.50:4080?addrs=188.184.94.50-
> 4080+[2001-1458-201-
> 		e4--100-62c]-4080&noUDP&sock=23745_4d36_179>"): the address
> isn't my
> 		default address. (Default: <188.185.81.179:9644?addrs=[2001-
> 1458-d00-2--
> 		100-1ad]-9644+188.185.81.179-9644>, found in ad:
> 		<188.184.94.50:4080?addrs=188.
> 		184.94.50-4080+[2001-1458-201-e4--100-62c]-
> 		4080&noUDP&sock=23745_4d36_179>)
> 		03/28/17 11:33:35 encrypting secret
> 		03/28/17 11:33:35 condor_write(fd=22 STARTD
> 		<127.0.0.1:21711?addrs=[2001-630-10-f001--19a0]-
> 21711+127.0.0.1-
> 		21711&noUDP> on
> 
> 	<[2001:630:10:f001::19a0]:8346>,,size=408,timeout=1,flags=0,non_bloc
> king=0)
> 		03/28/17 11:34:31 condor_read(fd=22 STARTD
> 		<127.0.0.1:21711?addrs=[2001-630-10-f001--19a0]-
> 21711+127.0.0.1-
> 		21711&noUDP> on
> 
> 	<[2001:630:10:f001::19a0]:8346>,,size=21,timeout=1,flags=0,non_block
> ing=1)
> 		03/28/17 11:34:31 condor_read(fd=22 STARTD
> 		<127.0.0.1:21711?addrs=[2001-630-10-f001--19a0]-
> 21711+127.0.0.1-
> 		21711&noUDP> on
> 
> 	<[2001:630:10:f001::19a0]:8346>,,size=263,timeout=1,flags=0,non_bloc
> king=1)
> 		03/28/17 11:34:31 encrypting secret
> 		03/28/17 11:34:31 CCB: received error from target daemon
> STARTD
> 		<127.0.0.1:21711?addrs=[2001-630-10-f001--19a0]-
> 21711+127.0.0.1-
> 		21711&noUDP> on <[2001:630:10:f001::19a0]:8346> with ccbid
> 17198 for
> 		request 19415 from (client which has gone away): failed to
> connect
> 		03/28/17 11:34:31 CCB: client for request 19415 to target
> daemon
> 		STARTD <127.0.0.1:21711?addrs=[2001-630-10-f001--19a0]-
> 21711+127.0.0.1-
> 		21711&noUDP> on <[2001:630:10:f001::19a0]:8346> with ccbid
> 17198
> 		disappeared before receiving error details.
> 		03/28/17 11:35:02 CollectorAd  : Updating ... "< Personal
> Condor at
> 		vocms0803.cern.ch@xxxxxxxxxxxxxxxxx
> <mailto:vocms0803.cern.ch@xxxxxxxxxxxxxxxxx>
> 		<mailto:vocms0803.cern.ch@xxxxxxxxxxxxxxxxx>  >"
> 		03/28/17 11:35:02 Trying to update collector
> 		<[2001:1458:201:e4::100:535]:9618>
> 		03/28/17 11:35:02 Attempting to send update via UDP to
> collector
> 		vocms0807.cern.ch <http://vocms0807.cern.ch/>
> <http://vocms0807.cern.ch/>
> 		<[2001:1458:201:e4::100:535]:9618>
> 		03/28/17 11:35:02 Guess address string for host =
> 		<[2001:1458:201:e4::100:535]:9618>, port = 0
> 		03/28/17 11:35:02 it was sinful string. ip =
> 		2001:1458:201:e4::100:535, port = 9618
> 		03/28/17 11:35:02 _condorOutMsg MTU changed from default to
> 60000
> 		03/28/17 11:35:02 SECMAN: command 19 UPDATE_COLLECTOR_AD to
> 		collector vocms0807.cern.ch:9618
> <http://vocms0807.cern.ch:9618/>  <http://vocms0807.cern.ch:9618/>  from
> UDP
> 		port 32109 (blocking, raw).
> 		03/28/17 11:35:02 SECMAN: no cached key for
> 		{<[2001:1458:201:e4::100:535]:9618>,<19>}.
> 		03/28/17 11:35:02 SECMAN: Security Policy:
> 
> 
> 		[3]
> 		03/28/17 09:44:49 (pid:3625703) attempt to connect to
> 		<131.225.205.29:9668> failed: Network is unreachable (connect
> errno = 101).
> 		03/28/17 09:44:49 (pid:3625703) ERROR: SECMAN:2003:TCP
> connection to
> 		collector cmssrv215.fnal.gov:9668
> <http://cmssrv215.fnal.gov:9668/>  <http://cmssrv215.fnal.gov:9668/>
> 		failed.
> 		03/28/17 09:44:49 (pid:3625703) Failed to start non-blocking
> update
> 		to <131.225.205.29:9668>.
> 		03/28/17 09:48:26 (pid:3625703) attempt to connect to
> 		<188.184.94.50:4080> failed: Network is unreachable (connect
> errno = 101).
> 		Will keep trying for 300 total seconds (300 to go).
> 
> 		03/28/17 09:49:03 (pid:3625703) attempt to connect to
> 		<188.184.94.50:4080> failed: Network is unreachable (connect
> errno = 101).
> 

[← Prev in Thread] Current Thread [Next in Thread→]