Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Lots of TIME_WAIT sockets killing server
- Date: Tue, 1 Jun 2010 14:19:39 +0200
- From: "J.A. Gutierrez" <spd@xxxxxxxxxxxxxxxxxxxx>
- Subject: [Condor-users] Lots of TIME_WAIT sockets killing server
Hello
I've found a problem in condor and I can't find the cause:
Since we upgrade our Linux condor slave ("execute") nodes
from Fedora Core 2 to CentOS 5.2 (and then, to CentOS 5.4),
if condor is active for a couple of days, the condor master host
gets its connection table filled with thousands of "TIME_WAIT"
sockets, so no new connections can be opened and the server
(which also acts as central NFS/NIS+ server) gets killed.
Our current setup is:
* NFS/NIS+/Condor master server:
- Sun SPARC server running Solaris 8.
- Condor master version 7.4.2
* NFS/NIS+/Condor clients:
- x86 PC's running Linux CentOS 5.4
- Condor 7.4.2
(when the server starts getting irresponsive, usually there are
no more than 6 PC's running condor)
Condor configuration:
- Common FILESYSTEM_DOMAIN/UID_DOMAIN on master and slaves
- USE_NFS = False
- USE_AFS = False
- ~condor is local on every PC
- mostly default settings for everything
IIRC, the problem started with the upgrade from Fededora Core 2
to Centos 5.2, while keeping the same condor installation.
Then, I upgraded condor to current release, but I got the same
problem.
Any idea?
Thanks...
--
PGP and other useless info at \
http://webdiis.unizar.es/~spd/ \
finger://daphne.cps.unizar.es/spd \ Timeo Danaos et dona ferentes
ftp://ivo.cps.unizar.es/pub/ \ (Virgilio)