Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Selected NETWORK_INTERFACE not always honored?

Date: Tue, 06 Feb 2007 15:14:22 +0000
From: Mark Calleja <M.Calleja@xxxxxxxxxxxxxxx>
Subject: [Condor-users] Selected NETWORK_INTERFACE not always honored?

Hi,

We seem to have cases where the NETWORK_INTERFACE defined is not alwayshonored, and I'd be grateful if anyone can shed some light as to whythis might happen. Our grid uses many flocked pools, all of whichoperate on a number of RFC 1918 addresses in172.24.*.* networks, so evenif machines have global IP addresses (on, say, eth0), they are givenadditional virtual interfaces on eth0:1 but using the same physical NIC.Condor is then forced to bind to these addresses by settingNETWORK_INTERFACE in the condor_config.local files. However, there seemto be times when a submit machine may use its global IP address insteadto communicate with an execute node, causing it to fall foul of anyfirewall in between which is configured to only allow through thetraffic originating from RFC 1918 addresses. Here's a snippet from aShadowLog when this happens:


2/6 01:40:02 (65.172) (10961):connect returns -1, errno = 110

2/6 01:40:02 (65.172) (10961):failed to connect to scheduler on<172.24.116.237:10265>


2/6 01:40:04 (65.159) (10962):connect returns -1, errno = 110

2/6 01:40:04 (65.159) (10962):failed to connect to scheduler on<172.24.116.251:10439>


2/6 01:40:08 (65.153) (10964):connect returns -1, errno = 110

2/6 01:40:08 (65.153) (10964):failed to connect to scheduler on<172.24.116.224:9640>


2/6 01:40:10 (65.152) (10965):connect returns -1, errno = 110

2/6 01:40:10 (65.152) (10965):failed to connect to scheduler on<172.24.116.199:9675>


2/6 01:40:12 (65.166) (10966):connect returns -1, errno = 110

2/6 01:40:12 (65.166) (10966):failed to connect to scheduler on<172.24.116.233:9825>

The log for the firewall protecting the execute nodes confirms thatthese connections were attempted by the submit host on its global IPaddress, and not the one set by NETWORK_INTERFACE. This occurs after thejobs have started, so initial comms must have used the correct sourceaddress to get through, which is what I find confusing. Why should thesubmit host suddenly start using its other, non-nominated, IP address?I'm also confused as to why the submit node is trying to communicate toa *scheduler*, but one thing at a time. These are all linux boxesrunning 6.8.3.


Thanks for any help,
Mark

Prev by Date: Re: [Condor-users] condor 6.8 manual (www.cs.wisc.edu down)
Next by Date: [Condor-users] Condor affecting TCP performance in vanilla universe
Previous by thread: [Condor-users] Announcing Condor Version 6.8.4
Next by thread: [Condor-users] Condor affecting TCP performance in vanilla universe
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

[Condor-users] Selected NETWORK_INTERFACE not always honored?