Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] No collector -- no connection to 9618
- Date: Tue, 01 Feb 2005 17:06:10 -0800
- From: Michael Hannon <jmh@xxxxxxxxxxxxxxxxxxx>
- Subject: [Condor-users] No collector -- no connection to 9618
Greetings. I recently installed Condor on a system running Redhat
Enterprise Linux server, version 3. I used the install script, rather
than the RPM.
Everything seemed to install properly, so I went to the "...now what?"
section of the manual. I immediately noticed that the collector and
negotiator processes were NOT running on the master server.
There are numerous error messages related to this, but they all seem to
come down to :
Can't connect to <169.237.mm.nn:9618>:0, errno = 111
where "169.237.mm.nn" is the IP address of the master server, of course.
See the appended segment of the Master log file, for example.
I've done google searches for this and have found a number of instances
of the problem. I've tried to follow all of the suggestions that I
found, but so far nothing has helped.
The problem doesn't seem to be a firewall issue, nor a hosts.deny issue,
as I briefly disabled both of them while trying to start Condor.
The only thing that may be slightly unusual is that I installed Condor
into /usr/local/condor, then had to move that directory to another
partition in order to save space on the original. There is still a symlink.
The config file seems to be in a place such that Condor can find it.
The Condor binaries are in the "condor" user's path. HOSTALLOW access
is granted to every address in our subnet. The master server's IP
address is available both in /etc/hosts and via DNS. I've tried making
the condor user the owner of all the files in /usr/local/condor/..., so
as to eliminate file-access problems.
If you can think of something else I should be looking at, please let me
know.
Thanks.
- Mike
1/31 19:19:46 ******************************************************
1/31 19:19:46 ** condor_master (CONDOR_MASTER) STARTING UP
1/31 19:19:46 ** /scratch/condor/sbin/condor_master
1/31 19:19:46 ** $CondorVersion: 6.6.7 Oct 11 2004 $
1/31 19:19:46 ** $CondorPlatform: I386-LINUX_RH9 $
1/31 19:19:46 ** PID = 21713
1/31 19:19:46 ******************************************************
1/31 19:19:46 Using config file: /home/condor/condor_config
1/31 19:19:46 Using local config files:
/home/condor/hosts/<master>/condor_config.local
1/31 19:19:46 DaemonCore: Command Socket at <169.237.mm.nn:40295>
1/31 19:19:46 Started DaemonCore process
"/usr/local/condor/sbin/condor_startd", pid and pgroup
= 21714
1/31 19:19:46 Started DaemonCore process
"/usr/local/condor/sbin/condor_schedd", pid and pgroup
= 21715
1/31 19:19:51 Can't connect to <169.237.mm.nn:9618>:0, errno = 111
1/31 19:19:51 Will keep trying for 10 seconds...
1/31 19:20:01 Connect failed for 10 seconds; returning FALSE
1/31 19:20:01 ERROR:
SECMAN:2003:TCP connection to <169.237.mm.nn:9618> failed
1/31 19:20:01 Can't send UPDATE_MASTER_AD to collector
<master>.physics.ucdavis.edu <169.237.mm.nn>
--
Michael Hannon mailto:hannon@xxxxxxxxxxxxxxxxxxx
Dept. of Physics 530.752.4966
University of California 530.752.4717 FAX
Davis, CA 95616-8677