RE: [Condor-users] Failover feature in condor 6.7.5

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Date: Tue, 8 Mar 2005 19:13:37 +0200

From: "Gabi Kliot" <gabik@xxxxxxxxxxxxxxxxx>

Subject: RE: [Condor-users] Failover feature in condor 6.7.5

Title: Message

You need to update all executables on all machines - all schedds, startds, masters, since they are clients that should send their ads to multiple collectors now instead of a single one. It is better you update ALL other executables as well, including Collector, Negotiator, etc...

Than update the config files - global config file for all clients (with new COLLECTOR_HOST) and local config file for Collector machines (with new DAEMON_LIST).

It is not enough only to update the config files - you will need to restart all your daemons. You can do it by condor_restart -all (I think).

The failover features has nothing to do with flocking. Those are two different mechanism for different things.

Regards,

Gabi

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Prakash Velayutham
Sent: Tuesday, March 08, 2005 5:47 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Failover feature in condor 6.7.5

Gabi Kliot wrote:

Hi

>So the only place I need to change is still the $CONDOR_HOME/etc/condor_config file, right? Here I added the IP of the second

> collector in the COLLECTOR_HOST variable. Would it be enough to just restart condor on the second server after doing this?

You need to add the IP of the second Collector to the COLLECTOR_HOST variable in $CONDOR_HOME/etc/condor_config file and add COLLECTOR to the DAEMON_LIST variable of the second Collector machine in the local config file of this second machine (just the same as it is done for the first Collector machine).

>Also is the NEGOTIATOR failover done the same way by adding the second server's IP to NEGOTIATOR_HOST variable?

>Is there a document that explains how these configs are done? I would be willing to experiment this and write a small doc if

> required.

Negotiator failover has not still been released. It is planned to be a part of the next Condor release, hopefully by the Condor week.

When ever it will be released, it will be of course accompanied by a detailed manual section regarding its installation and configuration (It will actually be unified section about Collector and Negotiator high availability).

It actually makes me very happy to know that there are people interested and anticipating the Negotiator failover feature in Condor. We are working hard those days to make it happen.

Regards,

Gabi

So the only place I need to change is still the $CONDOR_HOME/etc/condor_config file, right? Here I added the IP of the second collector in the COLLECTOR_HOST variable. Would it be enough to just restart condor on the second server after doing this? I get some errors of this kind when I do this...

DC_AUTHENTICATE: attempt to open invalid session frontier:17998:1110236945:14, failing

Any suggestions? Also is the NEGOTIATOR failover done the same way by adding the second server's IP to NEGOTIATOR_HOST variable? Is there a document that explains how these configs are done? I would be willing to experiment this and write a small doc if required.

Thanks,
Prakash
Thanks. Could you explain what is the way to restart services on the different machines after changing the COLLECTOR_HOST variable? Is it enough to just modify the local config file on the second collector to start the collector daemon and restart that server or should I restart condor on all the machines in the pool? Also does this failover have anything to do with flocking at all (Sorry for that stupid question, its just that I have never used flocking before)?

Prakash

Mailing List Archives

Authenticated access

RE: [Condor-users] Failover feature in condor 6.7.5