hi all,
we are running a cluster with 600+ cpus. the head node has two
interfaces one
facing the internet (128.180.2.45) and the other a private net
(192.168.*.*).
users log into this node to submit their jobs. all the other nodes
in the
cluster are in the private net.
everything seems fine except for 10 nodes in the cluster. these nodes
have
ipaddresses 192.168.1.10 through 192.168.1.19 (and hostnames blaze10
through
blaze19). if i do the following on the head node:
[asm4@blaze1 ~]$ condor_status blaze10 -l | grep IpAdd
PublicNetworkIpAddr = "<128.180.2.450:56927>"
StartdIpAddr = "<128.180.2.450:56927>"
PublicNetworkIpAddr = "<128.180.2.450:56927>"
StartdIpAddr = "<128.180.2.450:56927>"
similarly blaze11 shows ipaddress 128.180.2.451 in condor_status on
blaze1 and
so on. however, the same command, when used on some other
node, say blaze2 gives:
[asm4@blaze2 ~]$ condor_status blaze10 -l | grep IpAdd
PublicNetworkIpAddr = "<192.168.1.10:56927>"
StartdIpAddr = "<192.168.1.10:56927>"
PublicNetworkIpAddr = "<192.168.1.10:56927>"
StartdIpAddr = "<192.168.1.10:56927>"
which is the correct address.
in NegotiatorLog of the head node i see,
6/4 20:24:32 Request 147588.00000:
6/4 20:24:32 Failed to initiate socket to send MATCH_INFO to
slot2@xxxxxxxxxxxxxxxxxxxxx
6/4 20:24:32 Matched 147588.0 bad0@xxxxxxxxxxxxx
<128.180.2.45:45179>
preempting none <128.180.2.450:56927> slot2@xxxxxxxxxxxxxxxxxxxxx
6/4 20:24:32 Successfully matched with slot2@xxxxxxxxxxxxxxxxxxxxx
repeatedly.
i can log into each of these 10 nodes and their ipaddress seems to be
set
correctly.
we have 7.0.1 running on all (X86_64-LINUX_RHEL5) nodes
we also have BIND_ALL_INTERFACES set to true because we were trying a
few
things with flocking.
any ideas what could be wrong? thanks in advance.
--
regards
Ashutosh Mahajan
http://www.lehigh.edu/~asm4
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/