Am 06.03.19 um 20:30 schrieb Todd Tannenbaum: > On 3/6/2019 5:56 AM, Oliver Freyermuth wrote: >> Dear HTCondor experts, >> >> after a short (~5 minute) DNS and partial network outage today, we've >> observed several cases of: >> >> PERMISSION DENIED to condor_pool@xxxxxxxxxx from host XXX.YYY.ZZZ.XXX >> for command 2 (UPDATE_MASTER_AD), access level ADVERTISE_MASTER: reason: >> cached result for ADVERTISE_MASTER; see first case for the full reason >> >> on the Central Manager (i.e. the collector), which persisted over hours. >> It seems the cache entries never expire? >> > [snip] >> I can not make out an automatic expiration of such DENY entries from >> temporary DNS failures. >> >> Is the only way to recover from something like this a restart of the >> collector, or am I missing something? >> > > > Hi Oliver, > > The cached ALLOW/DENY entries should purged periodically (approx every 8 hours by default), or whenever an admin does a condor_reconfig. > > The condor_config knob DNS_CACHE_REFRESH can be used to change from the eight hour default; the value is in seconds. > > Since you are looking at the code, note function IPVerify:refreshDNS() which is invoked upon reconfig, and also note a timer is setup to call this method periodically based on the DNS_CACHE_REFRESH knob at > > https://github.com/htcondor/htcondor/blob/master/src/condor_daemon_core.V6/daemon_core.cpp#L2971-L2987 Hi Todd, that's cool, many thanks for the pointer to the place in the code! Nice technique to just re-set the init flag and queue recreation of the PermHashTable upon reconfig and with a timer. For our case, where DNS is usually extremely stable (the failure was an announced maintenance), I'll probably just run condor_reconfig manually in such cases in the future. Good to see there's a knob in case we decide to reduce the interval :-). Best regards and many thanks for the insightful reply! Oliver > > regards, > Todd >
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature