ok.... now it appears i've really screwed things up.... running 'condor_status' bombs... in that it fails to connect to the collector... [root@lserver5 etc]# condor_status CEDAR:6001:Failed to connect to <192.168.1.55:9618> Error: Couldn't contact the condor_collector on lserver5. however, i can ping lserver5 (which is the machine itself) [root@lserver5 etc]# ping lserver5 PING lserver5 (192.168.1.55) from 192.168.1.55 : 56(84) bytes of data. 64 bytes from lserver5 (192.168.1.55): icmp_seq=1 ttl=64 time=0.103 ms 64 bytes from lserver5 (192.168.1.55): icmp_seq=2 ttl=64 time=0.034 ms so.. what gives... i changed the condor_config.local file to add the network_interface ###################################################################### ## Local settings ###################################################################### ###################################################################### ## Place your own local configuration settings for your central ## manager here. NETWORK_INTERFACE = 192.168.1.55 ------------------------ i changed the /etc/hosts file to be: [root@lserver5 etc]# cat /etc/hosts # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1 localhost.localdomain localhost 192.168.1.55 lserver5 192.168.1.57 lserver7 and now.. it appears i'm worse off than i was earlier.... any ideas/suggestions.... thanks... bruce ps. the /home/condor/log/MasterLog displays.... -------------------- 10/7 09:17:06 DaemonCore: PERMISSION DENIED to unknown user from host <192.168.1.55:33806> for command 60005 (DC_OFF_GRACEFUL) 10/7 09:17:13 DaemonCore: Command received via TCP from host <192.168.1.55:33807> 10/7 09:17:13 DaemonCore: received command 60004 (DC_RECONFIG), calling handler (handle_reconfig()) 10/7 09:17:13 Reconfiguring all running daemons. 10/7 09:17:13 Sent SIGHUP to COLLECTOR (pid 1428) 10/7 09:17:13 Sent SIGHUP to NEGOTIATOR (pid 1429) 10/7 09:17:13 Sent SIGHUP to STARTD (pid 1430) 10/7 09:17:13 Sent SIGHUP to SCHEDD (pid 1431) 10/7 09:17:13 Can't connect to <192.168.1.55:9618>:0, errno = 111 10/7 09:17:13 Will keep trying for 10 seconds... 10/7 09:17:23 Connect failed for 10 seconds; returning FALSE 10/7 09:17:23 ERROR: SECMAN:2003:TCP connection to <192.168.1.55:9618> failed 10/7 09:17:23 Can't send UPDATE_MASTER_AD to collector lserver5 <192.168.1.55:9618>: Failed to send UDP update command to collector 10/7 09:17:26 DaemonCore: PERMISSION DENIED to unknown user from host <192.168.1.55:33844> for command 453 (RESTART) 10/7 09:19:24 Can't connect to <192.168.1.55:9618>:0, errno = 111 10/7 09:19:24 Will keep trying for 10 seconds... 10/7 09:19:34 Connect failed for 10 seconds; returning FALSE 10/7 09:19:34 ERROR: SECMAN:2003:TCP connection to <192.168.1.55:9618> failed 10/7 09:19:34 Can't send UPDATE_MASTER_AD to collector lserver5 <192.168.1.55:9618>: Failed to send UDP update command to collector 10/7 09:24:34 Can't connect to <192.168.1.55:9618>:0, errno = 111 10/7 09:24:34 Will keep trying for 10 seconds... 10/7 09:24:44 Connect failed for 10 seconds; returning FALSE 10/7 09:24:44 ERROR: SECMAN:2003:TCP connection to <192.168.1.55:9618> failed -----Original Message----- From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] Sent: Thursday, October 07, 2004 9:08 AM To: 'roy hill (IGER-WP)'; 'Condor-Users Mail List' Subject: RE: [Condor-users] linux configuration.... roy, how can i check/change the permissions to be read/written by 'condor' without screwing it up for other apps... thanks... -bruce -----Original Message----- From: roy hill (IGER-WP) [mailto:roy.hill@xxxxxxxxxxx] Sent: Thursday, October 07, 2004 8:45 AM To: bedouglas@xxxxxxxxxxxxx; Condor-Users Mail List Subject: RE: [Condor-users] linux configuration.... Bruce, Check the permissions on the Hosts file it needs to be set to be read by your Condor account. Best regards, Roy. -----Original Message----- From: bruce [mailto:bedouglas@xxxxxxxxxxxxx] Sent: 07 October 2004 16:38 To: 'Condor-Users Mail List' Subject: [Condor-users] linux configuration.... hi... i have managed to get condor up/running on two linux boxes. however, when i attempt to do a 'condor_status' on the 'central manager' it shows only one machine. an examination of the /home/condor/log/CollectorLog file shows a warning refering to the /etc/host file. it appears that the 2nd machine is not able to 'see' the 'central manager' condor... 10/7 07:54:19 ** condor_collector (CONDOR_COLLECTOR) STARTING UP 10/7 07:54:19 ** /opt/condor-6.6.6/sbin/condor_collector 10/7 07:54:19 ** $CondorVersion: 6.6.6 Jul 26 2004 $ 10/7 07:54:19 ** $CondorPlatform: I386-LINUX_RH9 $ 10/7 07:54:19 ** PID = 1428 10/7 07:54:19 ****************************************************** 10/7 07:54:19 Using config file: /opt/condor-6.6.6/etc/condor_config 10/7 07:54:19 Using local config files: /home/condor/condor_config.local 10/7 07:54:19 DaemonCore: Command Socket at <127.0.0.1:9618> 10/7 07:54:19 WARNING: Condor is running on the loopback address (127.0.0.1) 10/7 07:54:19 of this machine, and is not visible to other hosts! 10/7 07:54:19 This may be due to a misconfigured /etc/hosts file. 10/7 07:54:19 Please make sure your hostname is not listed on the 10/7 07:54:19 same line as localhost in /etc/hosts. 10/7 07:54:19 In ViewServer::Init() 10/7 07:54:19 In CollectorDaemon::Init() 10/7 07:54:19 In ViewServer::Config() 10/7 07:54:19 In CollectorDaemon::Config() 10/7 07:54:19 enable: Creating stats hash table 10/7 07:54:19 (Sent 0 ads in response to query) 10/7 07:54:19 Got QUERY_STARTD_PVT_ADS 10/7 07:54:19 (Sent 0 ads in response to query) 10/7 07:54:20 WARNING: No master ad for < localhost.localdomain > 10/7 07:54:20 ScheddAd : Inserting ** "< localhost.localdomain , 127.0.0.1 >" 10/7 07:54:20 stats: Inserting new hashent for 'Schedd':'localhost.localdomain':'127.0.0.1' 10/7 07:54:24 ** Master < localhost.localdomain > rejuvenated from recently down 10/7 07:54:24 stats: Inserting new hashent for 'Master':'localhost.localdomain':'127.0.0.1' 10/7 07:54:32 StartdAd : Inserting ** "< localhost.localdomain , 127.0.0.1 >" 10/7 07:54:32 stats: Inserting new hashent for 'Start':'localhost.localdomain':'127.0.0.1' 10/7 07:54:32 StartdPvtAd : Inserting ** "< localhost.localdomain , 127.0.0.1 >" 10/7 07:54:32 stats: Inserting new hashent for 'StartdPvt':'localhost.localdomain':'127.0.0.1' the /etc/hosts file is: [root@lserver5 etc]# cat /etc/hosts # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1 lserver5 localhost.localdomain localhost 192.168.1.57 lserver7 is there something that i should do to the /etc/hosts file. why am i able to ping/access the 'central manager' machine (lserver5) from the client machine (lserver2) by simply 'ping lserver5'.... i can provide the relevant portion of the 'condor_config' file if needed. i'm really at a loss as to how to proceed!!!!!!! thanks... -bruce _______________________________________________ Condor-users mailing list Condor-users@xxxxxxxxxxx http://lists.cs.wisc.edu/mailman/listinfo/condor-users
<<attachment: winmail.dat>>