Re: [Condor-users] No collector -- no connection to 9618


Date: Wed, 02 Feb 2005 11:36:14 +0530
From: Prashant Lal <lalp@xxxxxxxxxxx>
Subject: Re: [Condor-users] No collector -- no connection to 9618
The /etc/hosts should be like:

127.0.0.1               localhost.localdomain localhost
169.237.X.X      master    master.some domain.com

Is it like that ?


On Wed, 2005-02-02 at 06:36, Michael Hannon wrote:
Greetings.  I recently installed Condor on a system running Redhat 
Enterprise Linux server, version 3.  I used the install script, rather 
than the RPM.

Everything seemed to install properly, so I went to the "...now what?" 
section of the manual.  I immediately noticed that the collector and 
negotiator processes were NOT running on the master server.

There are numerous error messages related to this, but they all seem to 
come down to :

Can't connect to <169.237.mm.nn:9618>:0, errno = 111

where "169.237.mm.nn" is the IP address of the master server, of course. 
  See the appended segment of the Master log file, for example.

I've done google searches for this and have found a number of instances 
of the problem.  I've tried to follow all of the suggestions that I 
found, but so far nothing has helped.

The problem doesn't seem to be a firewall issue, nor a hosts.deny issue, 
as I briefly disabled both of them while trying to start Condor.

The only thing that may be slightly unusual is that I installed Condor 
into /usr/local/condor, then had to move that directory to another 
partition in order to save space on the original.  There is still a symlink.

The config file seems to be in a place such that Condor can find it. 
The Condor binaries are in the "condor" user's path.  HOSTALLOW access 
is granted to every address in our subnet.  The master server's IP 
address is available both in /etc/hosts and via DNS.  I've tried making 
the condor user the owner of all the files in /usr/local/condor/..., so 
as to eliminate file-access problems.

If you can think of something else I should be looking at, please let me 
know.

Thanks.

					- Mike

1/31 19:19:46 ******************************************************
1/31 19:19:46 ** condor_master (CONDOR_MASTER) STARTING UP
1/31 19:19:46 ** /scratch/condor/sbin/condor_master
1/31 19:19:46 ** $CondorVersion: 6.6.7 Oct 11 2004 $
1/31 19:19:46 ** $CondorPlatform: I386-LINUX_RH9 $
1/31 19:19:46 ** PID = 21713
1/31 19:19:46 ******************************************************
1/31 19:19:46 Using config file: /home/condor/condor_config
1/31 19:19:46 Using local config files: 
/home/condor/hosts/<master>/condor_config.local
1/31 19:19:46 DaemonCore: Command Socket at <169.237.mm.nn:40295>
1/31 19:19:46 Started DaemonCore process 
"/usr/local/condor/sbin/condor_startd", pid and pgroup
  = 21714
1/31 19:19:46 Started DaemonCore process 
"/usr/local/condor/sbin/condor_schedd", pid and pgroup
  = 21715
1/31 19:19:51 Can't connect to <169.237.mm.nn:9618>:0, errno = 111
1/31 19:19:51 Will keep trying for 10 seconds...
1/31 19:20:01 Connect failed for 10 seconds; returning FALSE
1/31 19:20:01 ERROR:
SECMAN:2003:TCP connection to <169.237.mm.nn:9618> failed

1/31 19:20:01 Can't send UPDATE_MASTER_AD to collector 
<master>.physics.ucdavis.edu <169.237.mm.nn>
Thanks and Regards
P r a s h a n t  L a l

Cadence Design Systems

Noida Export Processing Zone,
Noida - 201301,
Phone:+91 120 2562842, extn 4009
Fax:+91 120 2562231
Cell:+91 98101-44168

mailto:
lalp@ cadence.com
[← Prev in Thread] Current Thread [Next in Thread→]