Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Issue with connecting nodes to pool/master
- Date: Tue, 27 Jun 2006 19:25:02 -0400 (EDT)
- From: "Robert Wright" <dc0@xxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Issue with connecting nodes to pool/master
>> > What machine is the log file below from? You should only have
>> > a NegotiatorLog on one machine, the central manager.
>> 1.101.
>>
>
> Well, there's problem 1. Turn off your negotiator on 101. You should only
> have a negotiator on 102.
Done
>> > Errno 113 is "No Route To Host". Do you have your networking properly
>> > configured (ie can you ping your central manager from all your other
>> > machines?)
>> all ICMP, UDP, TCP traffic is passing properly... You name the service i
>> am able to transfer traffic. 21/22/23/80 etc
>>
>
> 9618 :)
LoL
[root@node0 log]# ping TRANSLTR
PING TRANSLTR.netfeds.com (192.168.1.102) 56(84) bytes of data.
64 bytes from TRANSLTR.netfeds.com (192.168.1.102): icmp_seq=1 ttl=64
time=1.88 ms
[root@TRANSLTR log]# ping node0
PING node0.netfeds.com (192.168.1.101) 56(84) bytes of data.
64 bytes from node0.netfeds.com (192.168.1.101): icmp_seq=0 ttl=64
time=0.176 ms
>> 6/21 09:24:14 ERROR "Required attribute "START" is not defined" at line
>> 255 in file util.C
>>
>
> There's another problem. Make sure that you have a
> START = <some valid expression> in either
> /usr/local/condor/etc/condor_config
Done, "START" error resolved however the below happens on node0... still
no route to host.
6/27 19:22:17 ******************************************************
6/27 19:22:17 ** condor_startd (CONDOR_STARTD) STARTING UP
6/27 19:22:17 ** /usr/local/condor/sbin/condor_startd
6/27 19:22:17 ** $CondorVersion: 6.6.11 Mar 23 2006 $
6/27 19:22:17 ** $CondorPlatform: I386-LINUX_RH9 $
6/27 19:22:17 ** PID = 2769
6/27 19:22:17 ******************************************************
6/27 19:22:17 Using config file: /usr/local/condor/etc/condor_config
6/27 19:22:17 Using local config files:
/usr/local/condor/local.node0/condor_config.local
6/27 19:22:17 DaemonCore: Command Socket at <192.168.1.101:55977>
6/27 19:22:24 New machine resource allocated
6/27 19:22:24 Failed to obtain keyboard or mouse idle information.
6/27 19:22:24 Assuming the keyboard and mouse to be infinitely idle.
6/27 19:22:24 About to run initial benchmarks.
6/27 19:22:32 Completed initial benchmarks.
6/27 19:22:32 State change: IS_OWNER is false
6/27 19:22:32 Changing state: Owner -> Unclaimed
6/27 19:22:36 Can't connect to <192.168.1.102:9618>:0, errno = 113
6/27 19:22:36 Will keep trying for 10 seconds...
the log on the 'central manager' ;)
6/27 19:13:32 Housekeeper: Ready to clean old ads
6/27 19:13:32 Cleaning StartdAds ...
6/27 19:13:32 Cleaning StartdPrivateAds ...
6/27 19:13:32 Cleaning ScheddAds ...
6/27 19:13:32 Cleaning SubmittorAds ...
6/27 19:13:32 Cleaning LicenseAds ...
6/27 19:13:32 Cleaning MasterAds ...
6/27 19:13:32 Cleaning CkptServerAds ...
6/27 19:13:32 Cleaning CollectorAds ...
6/27 19:13:32 Cleaning StorageAds ...
6/27 19:13:32 Housekeeper: Done cleaning
6/27 19:13:33 (Sent 3 ads in response to query)
6/27 19:13:33 Got QUERY_STARTD_PVT_ADS
6/27 19:13:33 (Sent 1 ads in response to query)
6/27 19:18:33 (Sent 3 ads in response to query)
6/27 19:18:33 Got QUERY_STARTD_PVT_ADS
6/27 19:18:33 (Sent 1 ads in response to query)
----
Some testing
[root@node0 log]# telnet 192.168.1.102 22
Trying 192.168.1.102...
Connected to TRANSLTR.netfeds.com (192.168.1.102).
Escape character is '^]'.
SSH-2.0-OpenSSH_4.2
[root@TRANSLTR log]# netstat -l
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address
State
~~~~~ TRUNCATED ~~~~
udp 0 0 TRANSLTR.netfeds.com:9614 *:*
udp 0 0 TRANSLTR.netfeds.com:9618 *:*
udp 0 0 TRANSLTR.netfeds.com:47251 *:*
udp 0 0 TRANSLTR.netfeds.com:44574 *:*
udp 0 0 TRANSLTR.netfeds.com:49952 *:*
udp 0 0 TRANSLTR.netfeds.com:44599 *:*
~~~~~ TRUNCATED ~~~~
[root@node0 log]# telnet 192.168.1.102 9618
Trying 192.168.1.102...
telnet: connect to address 192.168.1.102: No route to host
telnet: Unable to connect to remote host: No route to host
[root@TRANSLTR log]# ps -eef | grep condor
condor 3481 1 0 18:58 ? 00:00:00
/usr/local/condor/sbin/condor_master
condor 3482 3481 0 18:58 ? 00:00:00 condor_collector -f
condor 3483 3481 0 18:58 ? 00:00:00 condor_schedd -f
condor 3484 3481 0 18:58 ? 00:00:03 condor_startd -f
condor 3485 3481 0 18:58 ? 00:00:00 condor_negotiator -f
[root@node0 log]# ps -eef | grep condor
daemon 2766 1 0 19:22 ? 00:00:00
/usr/local/condor/sbin/condor_master
daemon 2767 2766 0 19:22 ? 00:00:00 condor_schedd -f
daemon 2769 2766 7 19:22 ? 00:00:08 condor_startd -f