I think I fixed the problem. The /etc/hosts file for each node in the cluster, including the master, had the un-updated ip information. Now condor is back working. Thank you for your help though.
Li Xi
Department of Chemical and Biological Engineering
University of Wisconsin-Madison
E-mail:sealyxi@xxxxxxxxx
From: "Tao.3.Chen@xxxxxxxxxxxxxxxxxxxxxxxxxxx" <Tao.3.Chen@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
To: Condor-Users Mail List <condor-users@xxxxxxxxxxx>
Sent: Tuesday, September 8, 2009 3:18:35 AM
Subject: [Condor-users] Antwort: Re: -- Failed to fetch ads from: <ip adress> : hostname
Hi,
I
am also new in condor, and I am not sure what actually you changed, why
do you only change the IP in the condor_config.local, I would suggest checking
the condor_config file, especially the condor_host,
HOSTALLOW_READ,HOSTALLOW_WRITE AND
HOSTALLOW_CONFIG, you may spend a little time to read through the config
file.
And also, I would suggest
you to search the relative information in google.... the condor resources
online are powerful.
PS: your character format is not reading
comfortable....I just want to help, as so much I got from the others in
this group:)
Li Xi <sealyxi@xxxxxxxxx>
Gesendet von: condor-users-bounces@xxxxxxxxxxx
09/07/2009 07:20 PM
Bitte antworten an
Condor-Users Mail List <condor-users@xxxxxxxxxxx> |
|
An
| Condor-Users Mail List <condor-users@xxxxxxxxxxx>
|
Kopie
|
|
Thema
| Re: [Condor-users] -- Failed to fetch
ads from: <ip adress> : hostname |
|
Some follow-up information of the same problem.
I checked the MasterLog and found entries like the following repeating
every 8 minutes:
9/7 12:07:58 Can't connect to <old_ip:old_port>:0, errno = 110
9/7 12:07:58 Will keep trying for 10 seconds...
9/7 12:07:59 Connect failed for 10 seconds; returning FALSE
9/7 12:07:59 ERROR: SECMAN:2003:TCP connection to <old_ip:old_port>
failed
where old_ip is the IP address of the master-node before the cluster was
moved. Similar record was found in NegotiatorLog and SchedLog. Apparently
I still need to change the IP information somewhere else other than the
one mentioned in the previous email, but I cannot figure it out...
Li Xi
Department of Chemical and
Biological Engineering
University of Wisconsin-Madison
E-mail:sealyxi@xxxxxxxxx
From: Li Xi <sealyxi@xxxxxxxxx>
To: condor-users@xxxxxxxxxxx
Sent: Friday, September 4, 2009 1:13:57 PM
Subject: [Condor-users] -- Failed to fetch ads from: <ip adress>
: hostname
Hello, all,
I am a beginner of condor and am really having problem managing our cluster.
It is a small cluster with one master-node (as the sever of condor) and
16 compute nodes. We recently disassembled the cluster and moved it to
another place, and after we plugged everything back in and turned on all
the machines, we found condor was not working. I noticed that since the
IP address for the master-node has changed, probably something need to
be changed in condor configuration as well. So I opened the "condor_config.local"
file on the master-node node, and updated the entry of "NETWORK_INTERFACE".
Then I was able to start condor:
# ps -ef | grep condor
condor 3639 1 0 Sep03 ?
00:00:11 /opt/condor/sbin/condor_master
condor 3651 3639 0 Sep03 ?
00:00:00 condor_collector -f
condor 3652 3639 0 Sep03 ?
00:00:01 condor_schedd -f
condor 3653 3639 0 Sep03 ?
00:00:00 condor_negotiator -f
root 15130 15111 0 12:55 pts/1 00:00:00
grep condor
But when I type "condor_q", sometimes it returns the queue, but
most of the time it returns:
-- Failed to fetch ads from: <ip adress> : hostname
It seems to be very unstable. I have rebooted the master-node once and
it did not help. Also jobs in the queue are still idling, they have not
been sent to the compute nodes (the system has been on for almost one day
now, and I am able to ssh to those nodes). I am not sure if there is anything
else I need to change upon the moving, or something went wrong. Any helps?
Thanks
Li Xi
Department of Chemical and
Biological Engineering
University of Wisconsin-Madison
E-mail:sealyxi@xxxxxxxxx
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/