Date: | Tue, 01 Feb 2005 17:06:10 -0800 |
---|---|
From: | Michael Hannon <jmh@xxxxxxxxxxxxxxxxxxx> |
Subject: | [Condor-users] No collector -- no connection to 9618 |
Greetings. I recently installed Condor on a system running Redhat
Enterprise Linux server, version 3. I used the install script, rather
than the RPM. Everything seemed to install properly, so I went to the "...now what?" section of the manual. I immediately noticed that the collector and negotiator processes were NOT running on the master server. There are numerous error messages related to this, but they all seem to come down to : Can't connect to <169.237.mm.nn:9618>:0, errno = 111 where "169.237.mm.nn" is the IP address of the master server, of course. See the appended segment of the Master log file, for example. I've done google searches for this and have found a number of instances of the problem. I've tried to follow all of the suggestions that I found, but so far nothing has helped. The problem doesn't seem to be a firewall issue, nor a hosts.deny issue, as I briefly disabled both of them while trying to start Condor. The only thing that may be slightly unusual is that I installed Condor into /usr/local/condor, then had to move that directory to another partition in order to save space on the original. There is still a symlink. The config file seems to be in a place such that Condor can find it. The Condor binaries are in the "condor" user's path. HOSTALLOW access is granted to every address in our subnet. The master server's IP address is available both in /etc/hosts and via DNS. I've tried making the condor user the owner of all the files in /usr/local/condor/..., so as to eliminate file-access problems. If you can think of something else I should be looking at, please let me know. Thanks. - Mike 1/31 19:19:46 ****************************************************** 1/31 19:19:46 ** condor_master (CONDOR_MASTER) STARTING UP 1/31 19:19:46 ** /scratch/condor/sbin/condor_master 1/31 19:19:46 ** $CondorVersion: 6.6.7 Oct 11 2004 $ 1/31 19:19:46 ** $CondorPlatform: I386-LINUX_RH9 $ 1/31 19:19:46 ** PID = 21713 1/31 19:19:46 ****************************************************** 1/31 19:19:46 Using config file: /home/condor/condor_config 1/31 19:19:46 Using local config files: /home/condor/hosts/<master>/condor_config.local 1/31 19:19:46 DaemonCore: Command Socket at <169.237.mm.nn:40295> 1/31 19:19:46 Started DaemonCore process "/usr/local/condor/sbin/condor_startd", pid and pgroup = 21714 1/31 19:19:46 Started DaemonCore process "/usr/local/condor/sbin/condor_schedd", pid and pgroup = 21715 1/31 19:19:51 Can't connect to <169.237.mm.nn:9618>:0, errno = 111 1/31 19:19:51 Will keep trying for 10 seconds... 1/31 19:20:01 Connect failed for 10 seconds; returning FALSE 1/31 19:20:01 ERROR: SECMAN:2003:TCP connection to <169.237.mm.nn:9618> failed 1/31 19:20:01 Can't send UPDATE_MASTER_AD to collector <master>.physics.ucdavis.edu <169.237.mm.nn> -- Michael Hannon mailto:hannon@xxxxxxxxxxxxxxxxxxx Dept. of Physics 530.752.4966 University of California 530.752.4717 FAX Davis, CA 95616-8677 |
[← Prev in Thread] | Current Thread | [Next in Thread→] |
---|---|---|
|
Previous by Date: | Re: [Condor-users] JVM System Parameters!!, Greg Thain |
---|---|
Next by Date: | Re: [Condor-users] No collector -- no connection to 9618, Prashant Lal |
Previous by Thread: | RE: [Condor-users] Newbie to condor, Todd Tannenbaum |
Next by Thread: | Re: [Condor-users] No collector -- no connection to 9618, Prashant Lal |
Indexes: | [Date] [Thread] |