Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] HTCondor-CE not purging finished jobs
- Date: Sat, 16 May 2020 18:05:59 +0200
- From: Stefano Dal Pra <stefano.dalpra@xxxxxxxxxxxx>
- Subject: [HTCondor-users] HTCondor-CE not purging finished jobs
Hello,
htcondor-ce-3.4.0-1.el7.noarch here.
We have a problem common to all of our CEs:
[root@ce02-htc ~]# condor_ce_q -cons '(JobStatus == 5 ) && (time() -
x509UserProxyExpiration > 4 * 3600)' -af Owner | sort | uniq -c
9592 user1
4 user2
1114 user3
575 user4
44 user5
I have set up REMOVE and REMOVE REASON rule:
SYSTEM_PERIODIC_REMOVE = (JobStatus == 5 && CurrentTime -
EnteredCurrentStatus > 3600*8)
SYSTEM_PERIODIC_REMOVE_REASON = strcat("CE job removed by
SYSTEM_PERIODIC_REMOVE due to ", ifThenElse((JobStatus == 5 &&
CurrentTime - EnteredCurrentStatus > 3600*8), "being in the hold state
for 8 hours.", ifThenElse((JobStatus == 5 &&
isUndefined(RoutedToJobId)), "non-existent route or entry in
JOB_ROUTER_ENTRIES.", "input files missing." ) ) )
Inspecting these "non purged jobs", they have a RemoveReason set, but
they are not gone nevertheless:
[root@ce02-htc ~]# condor_ce_q 1679707.0 -af JobStatus RemoveReason
5 CE job removed by SYSTEM_PERIODIC_REMOVE due to being in the hold
state for 8 hours.
Until now i have no better way than removing these jobs manually using
somethin like:
condor_ce_q -cons '(JobStatus == 5 ) && (time() -
x509UserProxyExpiration > 4 * 3600)' -af 'strcat(ClusterId,".",ProcId)'
| xargs condor_ce_rm
Do i miss something obvious?
Cheers,
Stefano