Mailing List Archives
	Authenticated access
	
	
     | 
    
	 
	 
     | 
    
	
	 
     | 
  
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] HTCondor-CE not purging finished jobs
- Date: Sat, 16 May 2020 18:05:59 +0200
 
- From: Stefano Dal Pra <stefano.dalpra@xxxxxxxxxxxx>
 
- Subject: [HTCondor-users] HTCondor-CE not purging finished jobs
 
Hello,
htcondor-ce-3.4.0-1.el7.noarch here.
We have a problem common to all of our CEs:
[root@ce02-htc ~]# condor_ce_q -cons '(JobStatus == 5 ) && (time() - 
x509UserProxyExpiration > 4 * 3600)' -af Owner | sort | uniq -c
   9592 user1
      4 user2
   1114 user3
    575 user4
     44 user5
I have set up REMOVE  and REMOVE REASON rule:
SYSTEM_PERIODIC_REMOVE = (JobStatus == 5 && CurrentTime - 
EnteredCurrentStatus > 3600*8)
SYSTEM_PERIODIC_REMOVE_REASON = strcat("CE job removed by 
SYSTEM_PERIODIC_REMOVE due to ", ifThenElse((JobStatus == 5 && 
CurrentTime - EnteredCurrentStatus > 3600*8), "being in the hold state 
for 8 hours.", ifThenElse((JobStatus == 5 && 
isUndefined(RoutedToJobId)), "non-existent route or entry in 
JOB_ROUTER_ENTRIES.", "input files missing." ) ) )
Inspecting these "non purged jobs", they have a RemoveReason set, but 
they are not gone nevertheless:
[root@ce02-htc ~]# condor_ce_q 1679707.0 -af JobStatus RemoveReason
5 CE job removed by SYSTEM_PERIODIC_REMOVE due to being in the hold 
state for 8 hours.
Until now i have no better way than removing these jobs manually using 
somethin like:
condor_ce_q -cons '(JobStatus == 5 ) && (time() - 
x509UserProxyExpiration > 4 * 3600)' -af 'strcat(ClusterId,".",ProcId)' 
| xargs condor_ce_rm
Do i miss something obvious?
Cheers,
Stefano