Hi all, I bumped our central manager from centos 6 to 7 and condor: from 8.2.10 to 8.6.1. All configs are the same. condor_status gives > # condor_status > Error: communication error > CEDAR:6001:Failed to connect to <144.92.167.251:9618> Master log says > 03/14/17 12:50:38 attempt to connect to <144.92.167.251:9618> failed: Connection refused (connect errno = 111). > 03/14/17 12:50:38 ERROR: SECMAN:2003:TCP connection to collector exocet.bmrb.wisc.edu failed. > 03/14/17 12:50:38 Failed to start non-blocking update to <144.92.167.251:9618>. > 03/14/17 12:55:38 attempt to connect to <144.92.167.251:9618> failed: Connection refused (connect errno = 111). > 03/14/17 12:55:38 ERROR: SECMAN:2003:TCP connection to collector exocet.bmrb.wisc.edu failed. > 03/14/17 12:55:38 Failed to start non-blocking update to <144.92.167.251:9618>. Collector log complains about old history files then says > 03/14/17 13:06:00 CollectorAd : Inserting ** "< BioMagResBank, UW-Madison@xxxxxxxxxxxxxxxxxxxx >" > 03/14/17 13:06:00 attempt to connect to <144.92.167.251:9618> failed: Connection refused (connect errno = 111). > 03/14/17 13:06:00 Failed to send update to collector exocet.bmrb.wisc.edu. > 03/14/17 13:06:00 Unable to send UPDATE_COLLECTOR_AD to all configured collectors Start, sched, and negotiator logs end with the same "Connection refused (connect errno = 111)". There's nothing in any of the /var/log/condor/* logs that indicates any problem. The port is open and iptables has blanket accept for loopback and local subnet. > # lsof -i -P | grep 9618 > condor_co 4071 condor 12u IPv4 59575 0t0 UDP *:9618 > condor_co 4071 condor 14u IPv6 59577 0t0 UDP *:9618 > # iptables -nvL > Chain INPUT (policy ACCEPT 0 packets, 0 bytes) > pkts bytes target prot opt in out source destination > 1074 77368 ACCEPT all -- lo * 0.0.0.0/0 0.0.0.0/0 > ... > 56917 5941K ACCEPT all -- * * 144.92.167.128/25 0.0.0.0/0 Yet > # telnet localhost 9618 > Trying ::1... > telnet: connect to address ::1: Connection refused > Trying 127.0.0.1... > telnet: connect to address 127.0.0.1: Connection refused If I stop condor and run netcat on port 9618 I get a whole lot of stuff, coming from other nodes presumably. So it looks like the port's fine and it's the collector that's refusing to talk to itself. Any suggestions as to where to look next? TIA -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
Attachment:
signature.asc
Description: OpenPGP digital signature