[HTCondor-users] COLLECTOR_PERSISTENT_AD_LOG huge file and collector restart

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Date: Fri, 06 Sep 2019 13:46:41 +0000

From: SCHAER Frederic <frederic.schaer@xxxxxx>

Subject: [HTCondor-users] COLLECTOR_PERSISTENT_AD_LOG huge file and collector restart

Hi,

On our cluster, COLLECTOR_PERSISTENT_AD_LOG was configured to point at /var/log/condor/AbsentLog

Seeing issues on some machines unable to contact the collector, I decided to restart it… and things started failing. Condor stopped responding on the collector VM. Impossible to restart condor.

Kill -9 required on collector to really stop it…

I finally figured out the following :

# du -h /var/log/condor/AbsentLog

17G /var/log/condor/AbsentLog

=> condor stop + kill -9 the collector, restarted condor, and voilà : collector was back up and running in a few seconds.

Our cluster is 287 machines big… is this expected to get such a huge file that apparently severely impacts the collector restart ?

Or is this an ever-growing file that sometimes must be cleaned up ?

Thanks

Mailing List Archives

Authenticated access

[HTCondor-users] COLLECTOR_PERSISTENT_AD_LOG huge file and collector restart