Subject: [Condor-users] Condor and monitoring performance
I was curious if anyone has suggestions
on how to monitor the health of a Condor pool? I am trying to track down
an error (Q3) and was also trying to develop a set of commands for monitoring
Condor.
1. I found this URL, which is helpful
but there appear to be some issues for windows.
URL: https://nmi.cs.wisc.edu/node/1481 I noticed that the binary for condor_updates_stats
does not exist with window installations of Condor. Is this a mistake or
is it not available with windows?
2. Does any one have suggestions for
querying Condor to help detect potential issues with performance?
3. I am getting the following error
and I am not sure how to determine if I need to modify my configuration
or whether there is something else wrong.
SchedLog excerpt:
06/30 18:53:04 (pid:1732) Received
UDP command 60011 (DC_NOP) from <xxx.xxx.xxx.xx:9608>, access
level READ
06/30 18:53:04 (pid:1732) Calling HandleReq
<handle_nop()> (0)
06/30 18:53:04 (pid:1732) Return from
HandleReq <handle_nop()> (handler: 0.000s, sec: 0.371s)
06/30 18:53:04 (pid:1732) Calling Handler
<SecManStartCommand::WaitForSocketCallback DC_INVALIDATE_KEY> (6)
06/30 18:53:04 (pid:1732) SECMAN: resuming
command 60014 DC_INVALIDATE_KEY to daemon at <xxx.xxx.xxx.xx:4278>
from TCP port 4371 (non-blocking, raw).
06/30 18:53:04 (pid:1732) SECMAN: TCP
connection to daemon at <xxx.xxx.xxx.xx:4278> failed.
06/30 18:53:04 (pid:1732) Failed to
send DC_INVALIDATE_KEY to daemon at <xxx.xxx.xxx.xx:4278>: SECMAN:2003:TCP
connection to daemon at <xxx.xxx.xxx.xx:4278> failed.
4. Occasionally I get an error when
I use condor_status or condor_q, which I believe is related to the errors
in Q3. Failed to fetch ads from ...
SECMAN: 2007: Failed to end classad
schedlog