Hi Brian
unfortunately, I have not found a smoking gun yet :-/
The CE is currently on [1].
selinux gets disabled by default and on quick check on the permissions
, I did notice anything suspicious [2]. The files got the correct
mapped user - including the CLUSTERID.log. Also the possible open file
handles should be sufficient.
On the fs side it is ext4 - so nothing fancy. And I do not see much
I/O wait or so, which might point to an underlying issue with the HV.
I noticed several stack dumps on the CE. But AFAIS there has been no
overlap between the affected PIDs/IDs and with these jobs.
Cheers,
 Thomas
[1]
condor-8.9.11-1.el7.x86_64
condor-boinc-7.16.11-1.el7.x86_64
condor-classads-8.9.11-1.el7.x86_64
condor-externals-8.9.11-1.el7.x86_64
condor-procd-8.9.11-1.el7.x86_64
htcondor-ce-4.4.1-3.el7.noarch
htcondor-ce-apel-4.4.1-3.el7.noarch
htcondor-ce-bdii-4.4.1-3.el7.noarch
htcondor-ce-client-4.4.1-3.el7.noarch
htcondor-ce-condor-4.4.1-3.el7.noarch
htcondor-ce-view-4.4.1-3.el7.noarch
python2-condor-8.9.11-1.el7.x86_64
python3-condor-8.9.11-1.el7.x86_64
CentOS Linux release 7.9.2009 (Core) @ 3.10.0-1160.11.1.el7.x86_64
[2]
root@grid-htcondorce0: [~] ls -all
/var/lib/condor-ce/spool/6446/0/cluster406446.proc0.subproc0
total 80
drwx------ 2 belleprd000 belleprd 4096 Mar 3 06:51 .
drwxr-xr-x 4 condor condor 4096 Mar 3 06:51 ..
-rw-r--r-- 1 belleprd000 belleprd 1028 Mar 3 10:36 406446.0.log
-rwxr-xr-x 1 belleprd000 belleprd 55919 Mar 3 06:51
DIRAC_nd5lYU_pilotwrapper.py
-rw------- 1 belleprd000 belleprd 10354 Mar 3 06:51 tmpBU9zHQ
> sestatus
SELinux status:ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ disabled
> cat /proc/sys/fs/file-max
1552725