HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-devel] STARTD_SENDS_ALVIES = TRUE is known safe?



Recently the default for STARTD_SENDS_ALIVES changed from FALSE to TRUE.

https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=671
https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1420

A few questions -

o What happens in the case of a pre-7.5.4 Schedd configured with STARTD_SENDS_ALIVES=TRUE and a post-7.5.4 Startd? o We can currently disconnect a Schedd and it will reconnect with running jobs when it returns.
  . Is this still possible with the Startd sending the alives?
  . What is the impact on the Startd when the Schedd is not accessible?
  . Is a test being written for shadow-starter reconnect?
 o A benefit of STARTD_SENDS_ALIVES is that it is TCP and ACKd.
. What other configuration changes must be done to a Schedd that is managing 10Ks of jobs? . #671 suggests offloading work from the Schedd, what's the impact on the Schedd performance in responding to ALIVES? What extra resources are used? o Before, if we wanted to renew all leases when a Schedd is going down we could with a small change to the Schedd. Is this still possible, or must a new protocol be created between Schedd and Startd?

Best,


matt