On 5/18/2017 2:17 AM, Michael Schwarzfischer wrote:
Dear all, We are running condor 8.4.9 on Windows 7 clients. Somehow our network seems to have some stability issues from time to time… Especially, after login it seems to happen that the network connection is not yet fully established leading to a crash in the condor_kbdd.exe. This can easily be simulated by disabling the network adapter. Without the connection condor_kbdd.exe just doesn’t start. Furthermore, there is no logging at all in that case (even with debug logging). We wonder if there are some workarounds or tricks in order to assure that the condor_kbdd is started and to assure that the process is still running. Thanks! Best, Michael
Strange, my daily driver is a Windows laptop that always runs the condor_kbdd.exe - often times I am logging in when no network connectivity is available, and have not observed a kbdd problem in years. Admittedly I tend to run the lastest release (currently running v8.7.1). There were some kbdd crash issues fixed back in v8.2.x, but a quick scan of the tickets at wiki.htcondor.org does not reveal any known problems in v8.4.x.
Just brainstorming, but perhaps you could try telling the condor_kbdd to community over the loopback network instead of a "real" IP address (of which perhaps you don't have one). You could append the following the your HTCondor config to give this a try:
# Tell the condor_kbdd to only use 127.0.0.1 for any/all communication # to the startd, and tell the startd to listen on all network interfaces # (to be certain the startd listens on both ethernet and loopback). KBDD.NETWORK_INTERFACE = 127.0.0.1 KBDD.BIND_ALL_INTERFACES = False STARTD.BIND_ALL_INTERFACES = True Hope the above helps, Todd