[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Multiple Collectors + CCB & Shared Port



Hi,

while toying with HTCondor (8.4.9) I encountered a problem when trying to accommodate multiple collectors on the same machine in conjunction with CCB and Shared_Port.

I'll give a short overview of my setup [which may help others who want to setup a similar infrastructure] and present the problem at the end of the mail.

----
What I essentially want is load-balancing the CCBs (with the following setup):

1. Main Collector reachable on the default Port 9618
2. Multiple sub-collectors running on other ports (>20000) which are reporting to the main collector via CONDOR_VIEW_HOST. 3. Negotiator, Schedd, StartD, etc. running "as usual" and using default ports 4. (Long term) Main collector & negotiator mirrored onto another machine via Condor HAD.

I followed
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToConfigCollectors
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToSetUpElasticComputeCloudPools
https://research.cs.wisc.edu/htcondor/HTCondorWeek2016/presentations/WedHover_provisioning.pdf

which led me to the following configuration file(s) [I copy/pasted the content in the order they are loaded in config.d]:

USE_SHARED_PORT                         = TRUE
AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST = TRUE
SHARED_PORT_PORT                        = $(COLLECTOR_PORT)
COLLECTOR_USES_SHARED_PORT              = True

### Collector Setup

DAEMON_LIST     = $(DAEMON_LIST), \
                  COLLECTOR

COLLECTOR.COLLECTOR_NAME         = Collector test

# Ability for multiple sub-collectors on a single machine
COLLECTOR.NumProcesses = 4

# Request Handling (condor_status): use multiple cores
COLLECTOR.COLLECTOR_QUERY_WORKERS     = $(DETECTED_CORES)/$(COLLECTOR.NumProcesses:1)

# Keep 100 MB Pool History file
COLLECTOR.KEEP_POOL_HISTORY           = TRUE
COLLECTOR.POOL_HISTORY_DIR            = $(LOG)
COLLECTOR.POOL_HISTORY_MAX_STORAGE    = 100000000

## Run multiple sub-collectors on one node
# Be careful with network/shared port daemon
COLLECTOR_HOST                  = $(CONDOR_HOST):20000
UPDATE_COLLECTOR_WITH_TCP       = True

USE_SHARED_PORT                 = False
COLLECTOR_USES_SHARED_PORT      = False
COLLECTOR.USE_SHARED_PORT       = False
COLLECTOR_ADDRESS_FILE          =
NEGOTIATOR.USE_SHARED_PORT      = True
SCHEDD.USE_SHARED_PORT          = True

SHARED_PORT_ARGS                = -p 9618

DAEMON_LIST = $(DAEMON_LIST), SHARED_PORT

# Forward ads to the main collector
# This value must not be defined via a complex macro; HTCondor expects a host-list and sends to host "IfThenElse"
CONDOR_VIEW_HOST          = $(CentralManager)

COLLECTOR02 = $(COLLECTOR)
COLLECTOR03 = $(COLLECTOR)

COLLECTOR02_ARGS = -f -p 20002 -sock collector2 -local-name COLLECTOR02
COLLECTOR03_ARGS = -f -p 20003 -sock collector3 -local-name COLLECTOR03

# Set Logfile visible to Master
COLLECTOR02_LOG = $(LOG)/Collector02Log
COLLECTOR03_LOG = $(LOG)/Collector03Log
# Set log file visible to collector in the environment
COLLECTOR02_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/Collector02Log _condor_COLLECTOR_HOST=$(CONDOR_HOST):20002 _condor_COLLECTOR_NAME=Collector2"
COLLECTOR03_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/Collector03Log _condor_COLLECTOR_HOST=$(CONDOR_HOST):20003 _condor_COLLECTOR_NAME=Collector3"

DAEMON_LIST = $(DAEMON_LIST), \
              COLLECTOR02, \
              COLLECTOR03, \

---------------------------------------

The examples/wiki entries I've linked, etc. tell me to use

USE_SHARED_PORT                 = True
COLLECTOR.USE_SHARED_PORT       = False

which should result in all the collectors ignoring the shared port.

I toyed around a lot with assigning socket names, different/no ports to $COLLECTOR_HOST, different values for COLLECTOR_USES_SHARED_PORT, etc. and it essentially comes down to a single problem:

The moment I set USE_SHARED_PORT = True all others ports except 9618 are closed and.

Can somebody point me in a direction what I'm doing wrong regarding the combination of CCB + Shared Port together with multiple sub-collectors?

Thanks and best regards
Frank