Hello.
Memory usage of our condor_negotiator process started a continuous
slow climb after I enabled preemption. It takes about 48 hours for it
to go from 0 to 25GB at fairly constant rate (at which point our
central manager runs out of memory). Before preemption,
condor_negotiator used at most 1GB of memory.
Is that normal? Our pool has about 6000 cores and about 20k jobs in
the queue. Upgrading the central manager from 8.3.8 to 8.5.1 didn't
help (all other machines in our pool run 8.3.8). I didn't see anything
obviously wrong in the logs.
This behavior started when I replaced
NEGOTIATOR_CONSIDER_PREEMPTION = False
with
NEGOTIATOR_CONSIDER_PREEMPTION = True
ALLOW_PSLOT_PREEMPTION = True
PREEMPTION_REQUIREMENTS = False
(we only do rank-based preemption)