Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Computers missing from Condor pool
- Date: Tue, 26 Feb 2008 10:17:18 -0600
- From: Daniel Forrest <forrest@xxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Computers missing from Condor pool
Rob,
> Thank you for your reply. I've been wary of changing to TCP because of
> the warnings in condor_config and the manual, as well as the effect it
> might have on network / system load, but I'm willing to explore this
> option further.
There are some other things to look at with UDP. Monitor the output
of "netstat -su" looking at "packet receive errors". If this number
is going up then you are losing packets.
One thing is to increase the size of the collector buffer:
COLLECTOR_SOCKET_BUFSIZE = 10000000
Another is to increase some system parameters:
/etc/sysctl.conf:
net.core.rmem_default = 65535
net.core.wmem_default = 65535
net.core.rmem_max = 8388607
net.core.wmem_max = 8388607
net.ipv4.tcp_wmem = 4096 65536 8388607
net.ipv4.tcp_rmem = 4096 65536 8388607
We have also done this:
MASTER_UPDATE_INTERVAL = $RANDOM_CHOICE(290,291,292,293,294,295,296,297,298,299,301,302,303,304,305,306,307,308,309,310)
UPDATE_INTERVAL = $RANDOM_CHOICE(290,291,292,293,294,295,296,297,298,299,301,302,303,304,305,306,307,308,309,310)
... on the compute nodes to keep them from flooding the collector all
at the same time (since they tend to sync up if you ever do a
condor_reconfig -all).
--
Daniel K. Forrest Laboratory for Molecular and
forrest@xxxxxxxxxxxxx Computational Genomics
(608) 262 - 9479 University of Wisconsin, Madison