Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] dags and max open files
- Date: Fri, 12 Aug 2022 12:34:06 -0500
- From: Michael Thomas <wart@xxxxxxxxxxx>
- Subject: [HTCondor-users] dags and max open files
We recently upgraded to condor 9.0.15 (which may or may not be relevant)
and are now seeing some schedds reporting "too many open files", for
example:
08/12/22 10:43:02 (pid:4627)
Daemon::startCommand(INVALIDATE_SUBMITTOR_ADS,...) making connection to
<10.13.5.25:9618?alias=ldas-condor.ldas.ligo-la.caltech.edu>
08/12/22 10:43:02 (pid:4627) Can't open directory
"/etc/condor/passwords.d" as PRIV_ROOT, errno: 24 (Too many open files)
08/12/22 10:43:02 (pid:4627) Can't open directory
"/etc/condor/passwords.d" as PRIV_ROOT, errno: 24 (Too many open files)
08/12/22 10:43:02 (pid:4627) Can't open directory "/etc/condor/tokens.d"
as PRIV_ROOT, errno: 24 (Too many open files)
08/12/22 10:43:02 (pid:4627) getTokenSigningKey():
read_secure_file(/etc/condor/condor_cred) failed!
08/12/22 10:43:02 (pid:4627) TOKEN: No token found.
08/12/22 10:43:02 (pid:4627) SECMAN: required authentication with
collector ldas-condori failed, so aborting command INVALIDATE_SUBMITTOR_ADS.
I'm able to work around this by increasing the file descriptor limit on
the schedd from the default of 4096 with:
SCHEDD_MAX_FILE_DESCRIPTORS = 32768
Looking in /proc/$pid/fd for the condor_schedd process, I see almost all
open files are related to user .out, .err, and /dev/null fds from user
dagman jobs.
Is it to be expected that there would be a lot of open files from dagman
jobs?
--Mike