Hi all,
I was taking a look into secondary collectors as we'll be needing them soon enough. I followed the wiki guides on the config needed for them, for reference we're running the 8.3.4 release of HTCondor. ## Configure the sub-collectors for tiered collecting. ## Reduces load on the central collector COLLECTOR2 = $(COLLECTOR) COLLECTOR2_ARGS = -f -p 10002 COLLECTOR2_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/Collector2Log" COLLECTOR3 = $(COLLECTOR) COLLECTOR3_ARGS = -f -p 10003 COLLECTOR3_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/Collector3Log" COLLECTOR4 = $(COLLECTOR) COLLECTOR4_ARGS = -f -p 10004 COLLECTOR4_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/Collector4Log" CONDOR_VIEW_HOST = $(COLLECTOR_HOST) As you can see I specify the ports as 10001-10004. The new processes start up fine and everything looks okay. So i went ahead and added a randomly chosen port selection to our worker nodes. However this is the result on the worker nodes: 04/22/15 09:09:53 attempt to connect to <(cm_ip):10002> failed: Connection refused (connect errno = 111). 04/22/15 09:09:53 ERROR: SECMAN:2004:Failed to create security session to <128.142.152.233:10002> with TCP.|SECMAN:2003:TCP connection to <128.142.152.233:10002> failed. 04/22/15 09:09:53 Failed to start non-blocking update to <128.142.152.233:10002>. I checked and temporarily disabled the firewall initially thinking that may have been the problem. That wasn't the case. The PID for the collector supposed to be running on port 10002 is 1980256, however when I check netstat I get the following: ~]# netstat -tulpn | grep 1980256 tcp 0 0 0.0.0.0:41384 0.0.0.0:* LISTEN 1980256/condor_coll udp 0 0 0.0.0.0:41384 0.0.0.0:* 1980256/condor_coll Sure enough changing the randomly chosen port on the worker node to 49225 results in the collector receiving the payload and registering the worker. Anyone got any suggestions, have I perhaps got a typo on the collector spawning that you can spot. Thanks, Iain |