Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] "file descriptors" problem again
- Date: Mon, 04 Sep 2006 11:52:32 +0100
- From: "Dr Ian C. Smith" <i.c.smith@xxxxxxxxxxxxxxx>
- Subject: [Condor-users] "file descriptors" problem again
Hi,
I recently upgraded to Condor 6.8.0 on our central manager in order to
fix a problem with Condor. See:
https://lists.cs.wisc.edu/archive/condor-users/2006-August/msg00039.shtml
This solved the problem but instead I started to see exactly the
same "out of file descriptors" messages errors as reported
in
https://lists.cs.wisc.edu/archive/condor-users/2006-April/msg00191.shtml
The symptoms are the same - after the daily reboot of the windows
execution hosts a large number sit idle even though there is a big
(20,000) queue of jobs waiting to run. When I went back to 6.6.9 the problem
disappeared.
I'm wondering if, as has been suggested, that the "out of file descriptors"
is a red herring - the OS is the same (solaris 8) and none of the limits
have been changed. At most there are around 100 jobs running concurrently
with vanilla universe. The default limit (ulimit -n) is 256 (although I
understand that this is per process).
Any ideas about this ? Would a diff(1) of the two codes show up anything.
I could move the Condor-G to another hosts to get around the first problem
but I'm more concerned that the Windows central manager is going to get
stuck with an out of date version of condor.
cheers,
-ian.
-----------------------------------
Dr Ian C. Smith,
e-Science team,
University of Liverpool
Computing Services Department