Hi Brian unfortunately, I have not found a smoking gun yet :-/ The CE is currently on [1].selinux gets disabled by default and on quick check on the permissions , I did notice anything suspicious [2]. The files got the correct mapped user - including the CLUSTERID.log. Also the possible open file handles should be sufficient. On the fs side it is ext4 - so nothing fancy. And I do not see much I/O wait or so, which might point to an underlying issue with the HV.
I noticed several stack dumps on the CE. But AFAIS there has been no overlap between the affected PIDs/IDs and with these jobs.
Cheers, Thomas [1] condor-8.9.11-1.el7.x86_64 condor-boinc-7.16.11-1.el7.x86_64 condor-classads-8.9.11-1.el7.x86_64 condor-externals-8.9.11-1.el7.x86_64 condor-procd-8.9.11-1.el7.x86_64 htcondor-ce-4.4.1-3.el7.noarch htcondor-ce-apel-4.4.1-3.el7.noarch htcondor-ce-bdii-4.4.1-3.el7.noarch htcondor-ce-client-4.4.1-3.el7.noarch htcondor-ce-condor-4.4.1-3.el7.noarch htcondor-ce-view-4.4.1-3.el7.noarch python2-condor-8.9.11-1.el7.x86_64 python3-condor-8.9.11-1.el7.x86_64 CentOS Linux release 7.9.2009 (Core) @ 3.10.0-1160.11.1.el7.x86_64 [2]root@grid-htcondorce0: [~] ls -all /var/lib/condor-ce/spool/6446/0/cluster406446.proc0.subproc0
total 80 drwx------ 2 belleprd000 belleprd 4096 Mar 3 06:51 . drwxr-xr-x 4 condor condor 4096 Mar 3 06:51 .. -rw-r--r-- 1 belleprd000 belleprd 1028 Mar 3 10:36 406446.0.log-rwxr-xr-x 1 belleprd000 belleprd 55919 Mar 3 06:51 DIRAC_nd5lYU_pilotwrapper.py
-rw------- 1 belleprd000 belleprd 10354 Mar 3 06:51 tmpBU9zHQ > sestatus SELinux status: disabled > cat /proc/sys/fs/file-max 1552725
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature