Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] DAGs and Error msg "Too open files"
- Date: Fri, 9 Dec 2011 06:55:17 -0700
- From: "Michael O'Donnell" <odonnellm@xxxxxxxx>
- Subject: [Condor-users] DAGs and Error msg "Too open files"
I am running a DAG in our pool (all machines including the central manager
run on a Windows OS (XP, 7, R2) with about 200 available machines for this
specific job and my DAG reports an error which changes its status from
Running to Idle (that is to say the jobs are running fine but the
condor_dag.exe produces an error).
The error log file the DAG produces this:
12/09/11 06:22:48 Can't open "ExtSimVal_DAG.dag.dagman.out"
dprintf() had a fatal error in pid 9684
Can't open "ExtSimVal_DAG.dag.dagman.out"
errno: 24 (Too many open files)
The jobs that I am running are using an enterprise database (PostgreSQL)
for most of the data, but I am also using a Windows R2 OS for a second
file server. The second file server is were I am writing my condor submit
log files and I am also storing about 575 GB of data, which the project is
also using for the analysis.
I do not think this is a limitation on the Windows OS, but I am not
positive. I believe the maximum number of open files per session is 16384
(determined by running 'net config server'). So, I was running 200 jobs: 3
log files open per submit file and one dataset open per job would be
approximately 800 files on this server.
I also read that there is a limitation of how many files MS DOS can open
and this can be changed in the following file:
C:\Windows\System32\config.nt (I had change files=40 to files=5000)
Despite this change I had made in the past for files opened by MS DOS, I
am still getting this error. Does anyone with a background in Microsoft OS
have any knowledge of how I might resolve this. Or is it possible this is
a limitation in Condor running on Windows systems (All machines within the
pool are Windows; running Condor V 7.6.1 with approx 300 nodes and 150
machines).
thank you for your help,
mike