Hi Thomas,
A good starting place is to try and find out what command was being sent to decern what condor was trying to do when this failure occurred. To do this you can add D_COMMAND:1 to the collector debug.
-Cole Bollig
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Thomas Hartmann <thomas.hartmann@xxxxxxx>
Sent: Wednesday, July 26, 2023 6:32 AM To: condor-users@xxxxxxxxxxx <condor-users@xxxxxxxxxxx> Subject: [HTCondor-users] collector complaining about not receiving command requests from execution points Hi all,
I am wondering about my test cluster's central manager collector, that is complaining about broken commands from execution points [1]. In principle, master and startd daemons are allowed to advertise on the collector side with workers looking healthy and showing up in the slot list. In principle, all looks good to me, so that I am not sure, what received commands(?) are supposed to be broken or timeouting? (assuming that there were no ads in the broken requests) Cheers, Thomas [1] 07/26/23 12:35:16 Got INVALIDATE_ADS_GENERIC 07/26/23 12:35:16 Walking tables to invalidate... O(n) 07/26/23 12:35:16 (Invalidated 0 ads) 07/26/23 12:35:16 DaemonCore: Can't receive command request from 131.169.223.162 (perhaps a timeout?) 07/26/23 12:35:43 Got INVALIDATE_ADS_GENERIC 07/26/23 12:35:43 Walking tables to invalidate... O(n) 07/26/23 12:35:43 (Invalidated 0 ads) |