Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Collector using a lot of CPU
- Date: Thu, 28 Apr 2005 15:36:39 -0400 (EDT)
- From: Leslie Groer <groer@xxxxxxxxxxxxxxxxxxx>
- Subject: [Condor-users] Collector using a lot of CPU
I am running 11 dedicated worker nodes (dual CPU, Scientific Linux 3.0.3,
Condor 6.7.3) with 4 VMs, two separate schedulers and another scheduler on
the CM node (dual 2.4 GHz Xeon, 2 GB RAM, 1GbE interface). The
condor_collector process always seems to be at about 77% CPU.
PID PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
10762 25 0 3832 3832 2120 R 76.3 0.1 63:58 2 condor_collecto
Is this normal? I believe the CM may be dropping UDP packets and hence is
removing VMs from the system. I upgraded to Condor 6.7.3 which is
supposed to help with this issue but I still see VMs being dropped.
I also doubled the ClassAD lifetime, and timeouts in the collector and
negotiator:
CLASSAD_LIFETIME = 1800
CLIENT_TIMEOUT = 60
NEGOTIATOR_TIMEOUT = 60
but I still see stale Ads being removed.
My concern is I am running only 5% of our worker nodes in the system so
far. What happens when I scale up to 220 worker nodes? Next step is to
go to TCP, but was wondering if there is some misconfiguration causing the
collector to be too busy. Relevant debugging and other parameters are set
at:
ALL_DEBUG = D_PROTOCOL D_MATCH
COLLECTOR_CLASS_HISTORY_SIZE = 1024
COLLECTOR_DAEMON_HISTORY_SIZE = 128
COLLECTOR_DAEMON_STATS = True
COLLECTOR_DEBUG =
MAX_COLLECTOR_LOG = 640000000
Thanks
Leslie Groer
--
,-~~-.___. ________________________________________________
/ | ' \ groer@xxxxxxxxxxxxxxxxxxx Department of Physics
( ) 0 Tel: +1-416-978-2959 University of Toronto
\_/-, ,----' Fax: +1-416-978-8221 60 St. George Street
==== // Toronto, ON M5S 1A7
/ \-'~; /~~~(O) Canada
/ __/~| / | Office: McLennan Physics Lab Room 911
=( _____| (_________| http://home.fnal.gov/~groer
Leslie S. Groer