Hi
>So the only place I need to change
is still the $CONDOR_HOME/etc/condor_config file, right? Here I added the IP of
the second
> collector in the
COLLECTOR_HOST variable. Would it be enough to just restart condor on the second
server after doing this?
You need to add the IP of the second Collector to
the COLLECTOR_HOST variable in $CONDOR_HOME/etc/condor_config
file and add COLLECTOR to the DAEMON_LIST variable of the
second Collector machine in the local config file of this second
machine (just the same as it is done for the first Collector
machine).
>Also is the NEGOTIATOR failover done the same way by
adding the second server's IP to NEGOTIATOR_HOST variable?
>Is there a document that explains
how these configs are done? I would be willing to experiment this and write a
small doc if
> required.
Negotiator failover has not still been released. It is
planned to be a part of the next Condor release, hopefully by the Condor
week.
When
ever it will be released, it will be of course accompanied by a detailed manual
section regarding its installation and configuration (It will actually be
unified section about Collector and Negotiator high
availability).
It
actually makes me very happy to know that there are people interested and
anticipating the Negotiator failover feature in Condor. We are working hard
those days to make it happen.
Regards,
Gabi
Nick LeRoy
wrote:
On Mon March 7 2005 4:18 pm, Prakash Velayutham wrote:
Ian Chesal wrote:
Hi,
I understand that the failover is a feature added in
condor-6.7.x versions. But I don't understand how to enable
this and configure the pool to work with this setup. Can
anyone help? As far as I know, there is nothing in the
documentation. I would like to be corrected in this regard.
See:
http://www.cs.wisc.edu/condor/manual/v6.7.5/8_2Development_Release.html#
SECTION00924000000000000000
The second bullet under "New Features" describes how to define multiple
collectors for failover.
- Ian
Hi Ian,
Thanks. What does the "High Availability" service under new features
section in the same link mean (8.2.6 Version 6.7.0)? It says:
Added a new ``High Availability'' service to the /condor_ master/. You
can now specify a daemon which can have ``fail over'' capabilities (i.e.
the master on another machine can start a matching daemon if the first
one fails). Currently, this is only available over a shared file system
(i.e. NFS), and has only been tested for the /condor_ schedd/.
I was looking to implement that. Is that the same as multiple collectors?
These are separate mechanisms, at least for now. :-( The feature that you
describe above is currently just for schedd fail-over. Separately, in recent
6.7 Condor releases, your pool can now have redundant collectors.
A feature that we very much hope will make the next 6.7 release of Condor will
provide for a fail-over mechanism for negotiators. This is, again, a
different mechanism.
-Nick So the only place I need to change is still the
$CONDOR_HOME/etc/condor_config file, right? Here I added the IP of the second
collector in the COLLECTOR_HOST variable. Would it be enough to just restart
condor on the second server after doing this? I get some errors of this kind
when I do this...
DC_AUTHENTICATE: attempt to open invalid session
frontier:17998:1110236945:14, failing
Any suggestions? Also is the
NEGOTIATOR failover done the same way by adding the second server's IP to
NEGOTIATOR_HOST variable? Is there a document that explains how these configs
are done? I would be willing to experiment this and write a small doc if
required.
Thanks, Prakash
|