Hi all,and another question/observation - we have noticed an odd behaviour on one of our EPs [1]. The node seem to have collapsed three weeks ago into a black hole. I.e., all the StarterLog.slot* activities has stoped around March 1st [1]. However, the startd has been accepting and "starting" jobs all along [3] sending the jobs to their doom.
I have not found yet a smoking gun in the master or startd log (unfortunately, our log replication does not reach back to beginning of March).
Has somebody maybe observed something similar? Cheers, Thomas [1] condor-9.0.8-1.el7.x86_64 condor-boinc-7.16.16-1.el7.x86_64 condor-classads-9.0.8-1.el7.x86_64 condor-externals-9.0.8-1.el7.x86_64 condor-procd-9.0.8-1.el7.x86_64 htcondor-ce-client-5.1.3-1.el7.noarch python2-condor-9.0.8-1.el7.x86_64 python3-condor-9.0.8-1.el7.x86_64 [2] [root@batch0653 ~]# ls -alltr /var/log/condor/StarterLog* | tail -n 5-rw-r--r-- 1 25411 1000 4992974 Mar 1 22:51 /var/log/condor/StarterLog.slot1_6 -rw-r--r-- 1 25411 1000 1928326 Mar 1 23:36 /var/log/condor/StarterLog.slot1_3 -rw-r--r-- 1 25411 1000 5323270 Mar 2 04:47 /var/log/condor/StarterLog.slot1_8 -rw-r--r-- 1 25411 1000 5730429 Mar 2 05:56 /var/log/condor/StarterLog.slot1_7 -rw-r--r-- 1 25411 1000 3578995 Mar 2 07:28 /var/log/condor/StarterLog.slot1_10
[root@batch0653 condor]# stat StarterLog.slot1_3 File: âStarterLog.slot1_3â Size: 1928326 Blocks: 3776 IO Block: 4096 regular file Device: 806h/2054d Inode: 524483 Links: 1 Access: (0644/-rw-r--r--) Uid: (25411/ UNKNOWN) Gid: ( 1000/ UNKNOWN) Access: 2024-03-21 14:05:38.397796356 +0100 Modify: 2024-03-01 23:36:56.630725665 +0100 Change: 2024-03-01 23:36:56.630725665 +0100 Birth: - [3][root@batch0653 condor]# grep "slot1_3" StartLog | grep "Owner -> Claimed" | head -n 3
03/21/24 14:36:47 slot1_3: Changing state: Owner -> Claimed 03/21/24 14:37:13 slot1_3: Changing state: Owner -> Claimed 03/21/24 14:37:39 slot1_3: Changing state: Owner -> Claimed[root@batch0653 condor]# grep "slot1_3" StartLog | grep "Owner -> Claimed" | wc -l
45
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature