Hi
You
need to update all executables on all machines - all schedds, startds,
masters, since they are clients that should send their ads to multiple
collectors now instead of a single one. It is better you update ALL other
executables as well, including Collector, Negotiator, etc...
Than
update the config files - global config file for all clients (with new
COLLECTOR_HOST) and local config file for Collector machines (with new
DAEMON_LIST).
It is
not enough only to update the config files - you will need to restart all your
daemons. You can do it by condor_restart -all (I think).
The
failover features has nothing to do with flocking. Those are two different
mechanism for different things.
Regards,
Gabi
Gabi Kliot
wrote:
Hi
>So the only place I
need to change is still the $CONDOR_HOME/etc/condor_config file, right? Here
I added the IP of the second
> collector in the
COLLECTOR_HOST variable. Would it be enough to just restart condor on the
second server after doing this?
You need to add the IP of the second Collector
to the COLLECTOR_HOST variable in
$CONDOR_HOME/etc/condor_config file and add COLLECTOR to the DAEMON_LIST variable of the second Collector machine in
the local config file of this second machine (just the same as it is
done for the first Collector
machine).
>Also is the NEGOTIATOR failover done the same way by adding
the second server's IP to NEGOTIATOR_HOST variable?
>Is there a document that
explains how these configs are done? I would be willing to experiment this
and write a small doc if
> required.
Negotiator failover has not still been released. It
is planned to be a part of the next Condor release, hopefully by the
Condor week.
When ever it will be released, it will be of course
accompanied by a detailed manual section regarding its installation and
configuration (It will actually be unified section about Collector and
Negotiator high availability).
It
actually makes me very happy to know that there are people interested and
anticipating the Negotiator failover feature in Condor. We are working hard
those days to make it happen.
Regards,
Gabi
So the only place I need to change
is still the $CONDOR_HOME/etc/condor_config file, right? Here I added the
IP of the second collector in the COLLECTOR_HOST variable. Would it be
enough to just restart condor on the second server after doing this? I get
some errors of this kind when I do this...
DC_AUTHENTICATE: attempt
to open invalid session frontier:17998:1110236945:14, failing
Any
suggestions? Also is the NEGOTIATOR failover done the same way by adding
the second server's IP to NEGOTIATOR_HOST variable? Is there a document
that explains how these configs are done? I would be willing to experiment
this and write a small doc if
required.
Thanks,
Prakash
Thanks. Could you explain what is the way to restart services on the
different machines after changing the COLLECTOR_HOST variable? Is it enough to
just modify the local config file on the second collector to start the
collector daemon and restart that server or should I restart condor on all the
machines in the pool? Also does this failover have anything to do with
flocking at all (Sorry for that stupid question, its just that I have never
used flocking before)?
Prakash