Mailing List Archives
	Authenticated access
	
	
     | 
    
	 
	 
     | 
    
	
	 
     | 
  
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Lots of TIME_WAIT sockets killing server
- Date: Tue, 1 Jun 2010 14:19:39 +0200
 
- From: "J.A. Gutierrez" <spd@xxxxxxxxxxxxxxxxxxxx>
 
- Subject: [Condor-users] Lots of TIME_WAIT sockets killing server
 
	Hello
	I've found a problem in condor and I can't find the cause:
	Since we upgrade our Linux condor slave ("execute") nodes
	from Fedora Core 2 to CentOS 5.2 (and then, to CentOS 5.4),
	if condor is active for a couple of days, the condor master host
	gets its connection table filled with thousands of "TIME_WAIT"
	sockets, so no new connections can be opened and the server
	(which also acts as central NFS/NIS+ server) gets killed.
	Our current setup is:
	* NFS/NIS+/Condor master server:
	- Sun SPARC server running Solaris 8.
	- Condor master version 7.4.2
	* NFS/NIS+/Condor clients:
	- x86 PC's running Linux CentOS 5.4
	- Condor 7.4.2
	(when the server starts getting irresponsive, usually there are
	no more than 6 PC's running condor)
	Condor configuration:
	- Common FILESYSTEM_DOMAIN/UID_DOMAIN on master and slaves
	- USE_NFS = False 
	- USE_AFS = False
	- ~condor is local on every PC
	- mostly default settings for everything
	IIRC, the problem started with the upgrade from Fededora Core 2
	to Centos 5.2, while keeping the same condor installation.
	Then, I upgraded condor to current release, but I got the same
	problem.
	Any idea?
	Thanks...
-- 
PGP and other useless info at      \
http://webdiis.unizar.es/~spd/      \
finger://daphne.cps.unizar.es/spd    \       Timeo Danaos et dona ferentes
ftp://ivo.cps.unizar.es/pub/          \                         (Virgilio)