[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-devel] STARTD_SENDS_ALVIES = TRUE is known safe?
- Date: Tue, 17 Aug 2010 07:36:26 -0400
- From: Matthew Farrellee <matt@xxxxxxxxxx>
- Subject: [Condor-devel] STARTD_SENDS_ALVIES = TRUE is known safe?
Recently the default for STARTD_SENDS_ALIVES changed from FALSE to TRUE.
https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=671
https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1420
A few questions -
o What happens in the case of a pre-7.5.4 Schedd configured with
STARTD_SENDS_ALIVES=TRUE and a post-7.5.4 Startd?
o We can currently disconnect a Schedd and it will reconnect with
running jobs when it returns.
. Is this still possible with the Startd sending the alives?
. What is the impact on the Startd when the Schedd is not accessible?
. Is a test being written for shadow-starter reconnect?
o A benefit of STARTD_SENDS_ALIVES is that it is TCP and ACKd.
. What other configuration changes must be done to a Schedd that is
managing 10Ks of jobs?
. #671 suggests offloading work from the Schedd, what's the impact on
the Schedd performance in responding to ALIVES? What extra resources are
used?
o Before, if we wanted to renew all leases when a Schedd is going down
we could with a small change to the Schedd. Is this still possible, or
must a new protocol be created between Schedd and Startd?
Best,
matt