Hi Greg,
It looks like that has worked but it's difficult to tell whether this was down to changing NETWORK_INTERFACE or just restarting the Condor service. I'll be more certain after I've pushed the new config files out to all the PCs and we'll see if the loopback addresses still appear. Will report back then.
thanks,
-ian. From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Hitchen, Greg (IM&T, Kensington WA) <Greg.Hitchen@xxxxxxxx>
Sent: 29 May 2020 01:27 To: HTCondor-Users Mail List Subject: Re: [HTCondor-users] execute hosts advertise loopback address Hi Ian
Have you tried putting ip subnet info in NETWORK_INTERFACE, rather than just *?
e.g. NETWORK_INTERFACE = 138.253.*
I think in the dim dark past we had a similar intermittent issue but have never had problems since adding our network subnets, at least on our windows machines.
Linux VMs (VMWare, vSphere, ESX servers) still require a cron job to check the condor network binding as they occasionally come up bound to the loopback address after outages/rebooting.
Cheers
Greg
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of Smith, Ian
Hello Again,
I've now had to chance to remotely login to a few of the Windows execute hosts and find pretty much the same as below. Running
condor_config_val IP_ADDRESS
always returns the correct IP address even if the loopback address is adverstised. On restarting the HTCondor service the correct address then gets advertised (this seems to be repeatable).
The service is set as Automatic (delayed start) with a dependency on DHCP. If anyone knows a way of delaying this further (or restarting it automatically) , I'd be grateful to hear it.
As a workaround, I'm going to set things up so that I can restart the HTCondor processes on the execute hosts remotely where machines advertise the loopback address. Not ideal - but hopefully an improvement.
regards,
-ian.
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Craig Parker <craig.parker@xxxxxxxxx>
Just chiming in quickly to say I have a similar sounding issue on my Win10 Condor clients, running Condor 8.8.3. I have a quick audit script running across my machines at present, and I had around 12 percent of them advertising 127.0.0.1 today.
Running 'condor_config_val IP_ADDRESS' on an affected machine always returns the correct IP address.
It seems to be related to machines coming out of sleep. A service restart or PC restart always fixes it, and honestly all I’ve done with it so far is to automate a restart of the Condor service if the client's 'shared_port_ad' file has the loopback address in it.
We’re back on campus now though, with a little time on our hands, so I hope to investigate this properly in the near future. I’ll report any findings here.
Cheers, Craig
|