RE: [Condor-users] Failover feature in condor 6.7.5

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Date: Tue, 8 Mar 2005 17:41:27 +0200

From: "Gabi Kliot" <gabik@xxxxxxxxxxxxxxxxx>

Subject: RE: [Condor-users] Failover feature in condor 6.7.5

Title: Message

>So the only place I need to change is still the $CONDOR_HOME/etc/condor_config file, right? Here I added the IP of the second

> collector in the COLLECTOR_HOST variable. Would it be enough to just restart condor on the second server after doing this?

You need to add the IP of the second Collector to the COLLECTOR_HOST variable in $CONDOR_HOME/etc/condor_config file and add COLLECTOR to the DAEMON_LIST variable of the second Collector machine in the local config file of this second machine (just the same as it is done for the first Collector machine).

>Also is the NEGOTIATOR failover done the same way by adding the second server's IP to NEGOTIATOR_HOST variable?

>Is there a document that explains how these configs are done? I would be willing to experiment this and write a small doc if

> required.

Negotiator failover has not still been released. It is planned to be a part of the next Condor release, hopefully by the Condor week.

When ever it will be released, it will be of course accompanied by a detailed manual section regarding its installation and configuration (It will actually be unified section about Collector and Negotiator high availability).

It actually makes me very happy to know that there are people interested and anticipating the Negotiator failover feature in Condor. We are working hard those days to make it happen.

Regards,

Gabi

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Prakash Velayutham
Sent: Tuesday, March 08, 2005 5:22 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Failover feature in condor 6.7.5

Nick LeRoy wrote:
On Mon March 7 2005 4:18 pm, Prakash Velayutham wrote:
  
Ian Chesal wrote:
    
Hi,

I understand that the failover is a feature added in
condor-6.7.x versions. But I don't understand how to enable
this and configure the pool to work with this setup. Can
anyone help? As far as I know, there is nothing in the
documentation. I would like to be corrected in this regard.
        
See:
http://www.cs.wisc.edu/condor/manual/v6.7.5/8_2Development_Release.html#
SECTION00924000000000000000

The second bullet under "New Features" describes how to define multiple
collectors for failover.

- Ian
      
Hi Ian,

Thanks. What does the "High Availability" service under new features
section in the same link mean (8.2.6 Version 6.7.0)? It says:

Added a new ``High Availability'' service to the /condor_ master/. You
can now specify a daemon which can have ``fail over'' capabilities (i.e.
the master on another machine can start a matching daemon if the first
one fails). Currently, this is only available over a shared file system
(i.e. NFS), and has only been tested for the /condor_ schedd/.

I was looking to implement that. Is that the same as multiple collectors?
    
These are separate mechanisms, at least for now.  :-(  The feature that you 
describe above is currently just for schedd fail-over.  Separately, in recent 
6.7 Condor releases, your pool can now have redundant collectors.

A feature that we very much hope will make the next 6.7 release of Condor will 
provide for a fail-over mechanism for negotiators.  This is, again, a 
different mechanism.

-Nick
So the only place I need to change is still the $CONDOR_HOME/etc/condor_config file, right? Here I added the IP of the second collector in the COLLECTOR_HOST variable. Would it be enough to just restart condor on the second server after doing this? I get some errors of this kind when I do this...

DC_AUTHENTICATE: attempt to open invalid session frontier:17998:1110236945:14, failing

Any suggestions? Also is the NEGOTIATOR failover done the same way by adding the second server's IP to NEGOTIATOR_HOST variable? Is there a document that explains how these configs are done? I would be willing to experiment this and write a small doc if required.

Thanks,
Prakash

Mailing List Archives

Authenticated access

RE: [Condor-users] Failover feature in condor 6.7.5