We lost a network router for a couple of hours. After the network was restored, I noticed that the CONDOR daemons across the entire cluster were sitting in various error states. We are running CONDOR 6.6.9 under the Vanilla universe.
Snippit of masterlog from the submit machine:
3/15 10:48:49 Send_Signal: ERROR Connect to <> failed.3/15 10:48:49 ERROR: failed to send signal 15 to pid 2496
3/15 10:48:50 Can't connect to <>:0, errno = 10061
3/15 10:48:50 Will keep trying for 10 seconds...
3/15 10:48:59 Connect failed for 10 seconds; returning FALSE
3/15 10:48:59 ERROR:
SECMAN:2003:TCP connection to <> failed
We were forced to kill the condor services on each individual machine. (The services did not respond to a STOP signal). Not a big deal with only a half-dozen machines, but as our cluster grows, this won't continue to be the case.
How do other CONDOR window users deal with these kinds of issues? Have you built scripts to perform these kinds of network maintenance issues? Are there CONDOR utilities (that I'm obviously unaware of) that resolve these kinds of problems? Or do I need to upgrade to a new version of CONDOR?
Thanks for any and all suggestions,
Tammy Chin
CATHENA Code Development Section
Thermalhydraulics Branch
J.L. Grey Engineering Centre, Stn. E6
Atomic Energy of Canada Ltd
Chalk River, ON K0J 1P0
Phone: 613.584.8811 x5010
Fax: 613.584.8023
Email: chint@xxxxxxx
CONFIDENTIAL AND PRIVILEGED INFORMATION NOTICE This e-mail, and any attachments, may contain information that is confidential, subject to copyright, or exempt from disclosure. Any unauthorized review, disclosure, retransmission, dissemination or other use of or reliance on this information may be unlawful and is strictly prohibited. AVIS D'INFORMATION CONFIDENTIELLE ET PRIVILÉGIÉE Le présent courriel, et toute pièce jointe, peut contenir de l'information qui est confidentielle, régie par les droits d'auteur, ou interdite de divulgation. Tout examen, divulgation, retransmission, diffusion ou autres utilisations non autorisées de l'information ou dépendance non autorisée envers celle-ci peut être illégale et est strictement interdite. |