Paul Marshall wrote:
Hello, I haven't been able to find any more up-to-date information on this issue: https://www-auth.cs.wisc.edu/lists/condor-users/2007-March/msg00026.shtml Could someone point me in the right direction? What is the best way to decrease the time that it takes Condor to recognize a node has failed and drop it from the system?
There's work going on to reverse the keepalive message direction: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=671You can experiment with that on your own by setting the following on both your startd's and schedd's:
STARTD_SENDS_ALIVES=true -- Lans Carstensen