Hi,
at GRIF we are currently testing HAD and Replication.
Things work just fine when declaring special ports for HAD and
REPLICATION. But, when setting REPLICATION_USE_SHARED_PORT to TRUE,
the replication service refuses to start and I see errors like these
in the Master log of the 2 master servers
03/10/17 17:01:26 ERROR: SharedPortEndpoint: failed to bind to
15f287e5db818c2dbce9638b70a6dc044992f0be80d2dc43848c983c1fc43fa5/MASTER:
Address already in use
03/10/17 17:01:26 ERROR: Create_Process failed trying to start
/usr/sbin/condor_replication
03/10/17 17:01:26 restarting /usr/sbin/condor_replication in 265 seconds
Below [1] my HAD/REPLICATION configuration.
.... What am I doing wrong?
Thanks,
Andrea
[1]
HAD_USE_SHARED_PORT = TRUE
REPLICATION_USE_SHARED_PORT = TRUE
REPLICATION_LIST = lpnhe-gs9088.in2p3.fr:$(SHARED_PORT_PORT)
llrmpicream.in2p3.fr:$(SHARED_PORT_PORT)
HAD_LIST = lpnhe-gs9088.in2p3.fr:$(SHARED_PORT_PORT)
llrmpicream.in2p3.fr:$(SHARED_PORT_PORT)
HAD_CONTROLLEE = NEGOTIATOR
HAD_CONNECTION_TIMEOUT = 10
HAD_USE_PRIMARY = true
DAEMON_LIST = $(DAEMON_LIST) HAD REPLICATION
HAD_USE_REPLICATION = true
STATE_FILE = $(SPOOL)/Accountantnew.log
REPLICATION_INTERVAL = 300
MAX_TRANSFER_LIFETIME = 300
HAD_UPDATE_INTERVAL = 300
MASTER_NEGOTIATOR_CONTROLLER = HAD
MASTER_HAD_BACKOFF_CONSTANT = 360