[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Missing x509* attributes in routed jobs



Hi John,
I've found a bunch of "immortal jobs" in the CEs. They seem to survive the SYSTEM_PERIODIC_REMOVE because
EnteredCurrentStatus never gets old enough:

Â[root@ce06-htc ~]# condor_ce_q 3860445.8 -af 'formattime(QDate,"%x-%X")' 'formattime(JobStartDate,"%x-%X")' 'formattime(JobLastStartDate,"%x-%X")' 'formattime(JobCurrentStartDate,"%x-%X")' 'formattime(EnteredCurrentStatus,"%x-%X")'
04/12/22-21:17:35 04/12/22-21:24:50 03/20/23-17:38:59 03/20/23-19:41:20 03/20/23-19:41:29

[root@ce06-htc ~]# cccv -v SYSTEM_PERIODIC_REMOVE
SYSTEM_PERIODIC_REMOVE = (JobStatus == 5 && time() - EnteredCurrentStatus > 3600*24) || (other stuff)

I'm going to remove these jobs and see if more are to come.
Stefano

Il 20/03/23 19:46, John M Knoeller via HTCondor-users ha scritto:
Jaime and I looked at the code and we don't see any way that these attributes can be cleared once set unless you are running a version of HTCondor from about 10 years ago.  

Are you sure that the routed job ever had the X509UserProxyVOName attribute?

-tj

-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Stefano Dal Pra
Sent: Thursday, March 16, 2023 11:52 AM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] Missing x509* attributes in routed jobs

Hello,
condor-9.0.17-1.el7, htcondor-ce-5.1.6-1.el7 here.

I have job history files (routed jobs) without x509* attributes.
I looked into that a bit more and it looks like x509* are lost when 
jobstatus moves 2 --> 5.

Example:
[root@ce04-htc ~]# condor_q -cons 'acctgroup == "belle"' -af jobstatus 
x509userproxyvoname | sort | uniq -c
 ÂÂÂÂ 46 2 belle
 ÂÂÂ 322 5 undefined

[root@ce04-htc ~]# condor_history -cons 'acctgroup == "belle"' -af 
lastjobstatus jobstatus x509userproxyvoname | sort | uniq -c
 ÂÂÂ 197 2 4 belle
 ÂÂ 2256 5 3 undefined

[root@ce03-htc ~]# condor_q -cons 'x509UserProxyVOName =?= undefined' 
-af acctgroup jobstatus | sort -u
atlas 1
belle 3
belle 5

[root@ce03-htc ~]# condor_ce_q -cons 'regexp("belle",owner)' -af 
x509UserProxyVOName | sort -u
belle

The jobs in the condor queue have the x509UserProxyVOName attribute when 
routed by the condor-ce,
but this seems to disappear when the job moves from jobstatus 2 to 5.

This is happening with belle only right now, but i have seen that 
happening with other VOs too, all of them using Dirac.

Any clue on why this could be happening?

Thanks
Stefano
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/