Hello,
htcondor-ce-3.4.0-1.el7.noarch here.
We have a problem common to all of our CEs:
[root@ce02-htc ~]# condor_ce_q -cons '(JobStatus == 5 ) && (time() -
x509UserProxyExpiration > 4 * 3600)' -af Owner | sort | uniq -c
9592 user1
4 user2
1114 user3
575 user4
44 user5
I have set up REMOVE and REMOVE REASON rule:
SYSTEM_PERIODIC_REMOVE = (JobStatus == 5 && CurrentTime -
EnteredCurrentStatus > 3600*8)
SYSTEM_PERIODIC_REMOVE_REASON = strcat("CE job removed by
SYSTEM_PERIODIC_REMOVE due to ", ifThenElse((JobStatus == 5 &&
CurrentTime - EnteredCurrentStatus > 3600*8), "being in the hold state
for 8 hours.", ifThenElse((JobStatus == 5 &&
isUndefined(RoutedToJobId)), "non-existent route or entry in
JOB_ROUTER_ENTRIES.", "input files missing." ) ) )
Inspecting these "non purged jobs", they have a RemoveReason set, but
they are not gone nevertheless:
[root@ce02-htc ~]# condor_ce_q 1679707.0 -af JobStatus RemoveReason
5 CE job removed by SYSTEM_PERIODIC_REMOVE due to being in the hold
state for 8 hours.
Until now i have no better way than removing these jobs manually using
somethin like:
condor_ce_q -cons '(JobStatus == 5 ) && (time() -
x509UserProxyExpiration > 4 * 3600)' -af
'strcat(ClusterId,".",ProcId)' | xargs condor_ce_rm
Do i miss something obvious?
Cheers,
Stefano
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/