Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Problems with a dual homed condor server
- Date: Thu, 19 May 2005 13:46:49 -0700
- From: Terrence Martin <tmartin@xxxxxxxxxxxxxxxx>
- Subject: [Condor-users] Problems with a dual homed condor server
I am trying to deploy a condor schedd on a dual homed system. The system
will take connections from the outside through grid middleware and then
run a local condor_submit to submit a job to a central collector via the
local schedd. The condor schedd should only communicate on the internal
interface which is a private IP address with the collector and the node
startd(s).
Here is my error.
5/19 12:18:53 Using config file: /etc/condor/condor_config
5/19 12:18:53 Using local config files:
/data/osg-0.1.5/condor/local.t2data4/condor_config.local
5/19 12:18:53 DaemonCore: Command Socket at <192.168.1.14:32791>
5/19 12:18:54 Started DaemonCore process
"/data/osg-0.1.5/condor/sbin/condor_schedd", pid and pgroup = 3183
5/19 12:19:00 DC_AUTHENTICATE: sock ip -> <192.168.1.14:32808>
5/19 12:19:00 DC_AUTHENTICATE: auth ip -> 198.202.74.80
5/19 12:19:00 DC_AUTHENTICATE: ERROR: IP not in agreement!!! BAILING!
5/19 12:38:30 DC_AUTHENTICATE: sock ip -> <192.168.1.14:32985>
5/19 12:38:30 DC_AUTHENTICATE: auth ip -> 198.202.74.80
I have run condor on multi-homed machines before and never seen this
error. It looks like condor is taking the first interface for one ip but
not the other? How can this be if I specify my NETWORK_INTERFACE?
Here are my configs
/etc/condor/condor_config:
# collector
CONDOR_HOST = t2cdf01.local
/data/osg-0.1.5/condor/local.t2data4/condor_config.local:
NETWORK_INTERFACE = 192.168.1.14
CONDOR_HOST = t2cdf01.local
Here are a few details of my network setup
# ifconfig
eth0 Link encap:Ethernet HWaddr 00:30:48:81:FC:BE
inet addr:198.202.74.80 Bcast:198.202.74.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:70432 errors:0 dropped:0 overruns:0 frame:0
TX packets:42263 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:19460279 (18.5 Mb) TX bytes:10153427 (9.6 Mb)
Interrupt:18
eth1 Link encap:Ethernet HWaddr 00:30:48:81:FC:BF
inet addr:192.168.1.14 Bcast:192.168.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:165956 errors:0 dropped:0 overruns:0 frame:0
TX packets:712 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:15639240 (14.9 Mb) TX bytes:110828 (108.2 Kb)
Interrupt:19
# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
192.168.1.14 t2data4.sdsc.edu t2data4
# hostname
t2data4
# cat /etc/resolv.conf
search local
nameserver 192.168.21.1
nameserver 192.168.1.13
I have tried various permutations of hosts files and hostname settings.
Rebooted etc.
The only explanation I have is that something is still ignoring the
NETWORK_INTERFACE directive and just picking the first network
interface. My next step is to swap network cables and reconfig the
interfaces so that the internal interface comes up first. That means a
trip to the computer center. Is there no way to strictle specific my
internal IP for this auth ip via a config file? Note, if I drop the
NETWORK_INTERFACE directive things work fine, at least locally.
Everything just uses the first interface.
Thanks
Terrence
UCSD