Hi, On our cluster, COLLECTOR_PERSISTENT_AD_LOG
was configured to point at /var/log/condor/AbsentLog Seeing issues on some machines unable to contact the collector, I decided to restart it… and things started failing. Condor stopped responding on the collector VM. Impossible to restart condor. Kill -9 required on collector to really stop it… I finally figured out the following : # du -h /var/log/condor/AbsentLog 17G /var/log/condor/AbsentLog => condor stop + kill -9 the collector, restarted condor, and voilà : collector was back up and running in a few seconds. Our cluster is 287 machines big… is this expected to get such a huge file that apparently severely impacts the collector restart ? Or is this an ever-growing file that sometimes must be cleaned up ? Thanks |