[Condor-users] condor_master cant connect with collector


Date: Tue, 8 Feb 2005 16:43:16 -0500
From: "Dave Lajoie" <dlajoie@xxxxxxxxxxxxxxxxxxxx>
Subject: [Condor-users] condor_master cant connect with collector
Hello Guys!
    I have been working on a linux condor deployment and ran into an issue.
    basically the condor_master can't connect with the condor_collector
    seems like condor_master is using an invalid port
 
Notice: seems like the collector is opening a port at 34287
where master is attempting to connect at 9618
 
in this case, the master could not start the collector process.
so I started it manually in order to get some more information
about the error.
 
I ran the condor_init, prior to run sbin/condor_master
do I need to update /etc/services with some entries?
 
I am missing something obvious again...
any help is welcomed ( I am so closed to get it working. ;)
Dave.
 
here are the logs
 
Collector:
2/8 16:09:06 ******************************************************
2/8 16:09:06 ** condor_collector (CONDOR_COLLECTOR) STARTING UP
2/8 16:09:06 ** /NET/LINUX_SERVEUR/CONDOR/sbin/condor_collector
2/8 16:09:06 ** $CondorVersion: 6.6.8 Jan 27 2005 $
2/8 16:09:06 ** $CondorPlatform: I386-LINUX_RH9 $
2/8 16:09:06 ** PID = 3880
2/8 16:09:06 ******************************************************
2/8 16:09:06 Using config file: /home/condor/condor_config
2/8 16:09:06 Using local config files: /NET/LINUX_SERVEUR/CONDOR/hosts/rn207/condor_config.local
2/8 16:09:06 DaemonCore: Command Socket at <192.168.10.207:34287>
2/8 16:09:06 In ViewServer::Init()
2/8 16:09:06 In CollectorDaemon::Init()
2/8 16:09:06 In ViewServer::Config()
2/8 16:09:06 In CollectorDaemon::Config()
2/8 16:09:11 enable: Creating stats hash table
2/8 16:24:11 Housekeeper:  Ready to clean old ads
2/8 16:24:11  Cleaning StartdAds ...
2/8 16:24:11  Cleaning StartdPrivateAds ...
2/8 16:24:11  Cleaning ScheddAds ...
2/8 16:24:11  Cleaning SubmittorAds ...
2/8 16:24:11  Cleaning LicenseAds ...
2/8 16:24:11  Cleaning MasterAds ...
2/8 16:24:11  Cleaning CkptServerAds ...
2/8 16:24:11  Cleaning CollectorAds ...
2/8 16:24:11  Cleaning StorageAds ...
2/8 16:24:11 Housekeeper:  Done cleaning
2/8 16:39:11 Housekeeper:  Ready to clean old ads
2/8 16:39:11  Cleaning StartdAds ...
2/8 16:39:11  Cleaning StartdPrivateAds ...
2/8 16:39:11  Cleaning ScheddAds ...
2/8 16:39:11  Cleaning SubmittorAds ...
2/8 16:39:11  Cleaning LicenseAds ...
2/8 16:39:11  Cleaning MasterAds ...
2/8 16:39:11  Cleaning CkptServerAds ...
2/8 16:39:11  Cleaning CollectorAds ...
2/8 16:39:11  Cleaning StorageAds ...
2/8 16:39:11 Housekeeper:  Done cleaning
Master
2/8 16:08:33 ******************************************************
2/8 16:08:33 ** condor_master (CONDOR_MASTER) STARTING UP
2/8 16:08:33 ** /NET/LINUX_SERVEUR/CONDOR/sbin/condor_master
2/8 16:08:33 ** $CondorVersion: 6.6.8 Jan 27 2005 $
2/8 16:08:33 ** $CondorPlatform: I386-LINUX_RH9 $
2/8 16:08:33 ** PID = 3864
2/8 16:08:33 ******************************************************
2/8 16:08:33 Using config file: /home/condor/condor_config
2/8 16:08:33 Using local config files: /NET/LINUX_SERVEUR/CONDOR/hosts/rn207/condor_config.local
2/8 16:08:33 DaemonCore: Command Socket at <192.168.10.207:34244>
2/8 16:08:33 Started DaemonCore process "/NET/LINUX_SERVEUR/CONDOR/sbin/condor_startd", pid and pgroup = 3865
2/8 16:08:33 Started DaemonCore process "/NET/LINUX_SERVEUR/CONDOR/sbin/condor_schedd", pid and pgroup = 3866
2/8 16:08:38 Can't connect to <192.168.10.207:9618>:0, errno = 111
2/8 16:08:38 Will keep trying for 10 seconds...
2/8 16:08:48 Connect failed for 10 seconds; returning FALSE
2/8 16:08:48 ERROR:
SECMAN:2003:TCP connection to <192.168.10.207:9618> failed
 
2/8 16:08:48 Can't send UPDATE_MASTER_AD to collector rn207.bbfxa.com <192.168.10.207:9618>: Failed to send UDP update command to collector
2/8 16:13:48 Can't connect to <192.168.10.207:9618>:0, errno = 111
2/8 16:13:48 Will keep trying for 10 seconds...
2/8 16:13:58 Connect failed for 10 seconds; returning FALSE
2/8 16:13:58 ERROR:
SECMAN:2003:TCP connection to <192.168.10.207:9618> failed
[← Prev in Thread] Current Thread [Next in Thread→]