Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Multiple Collectors + CCB & Shared Port
- Date: Mon, 28 Nov 2016 16:45:02 +0100
- From: Frank Fischer <frank.fischer@xxxxxxx>
- Subject: [HTCondor-users] Multiple Collectors + CCB & Shared Port
Hi,
while toying with HTCondor (8.4.9) I encountered a problem when trying
to accommodate multiple collectors on the same machine in conjunction
with CCB and Shared_Port.
I'll give a short overview of my setup [which may help others who want
to setup a similar infrastructure] and present the problem at the end of
the mail.
----
What I essentially want is load-balancing the CCBs (with the following
setup):
1. Main Collector reachable on the default Port 9618
2. Multiple sub-collectors running on other ports (>20000) which are
reporting to the main collector via CONDOR_VIEW_HOST.
3. Negotiator, Schedd, StartD, etc. running "as usual" and using default
ports
4. (Long term) Main collector & negotiator mirrored onto another machine
via Condor HAD.
I followed
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToConfigCollectors
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToSetUpElasticComputeCloudPools
https://research.cs.wisc.edu/htcondor/HTCondorWeek2016/presentations/WedHover_provisioning.pdf
which led me to the following configuration file(s) [I copy/pasted the
content in the order they are loaded in config.d]:
USE_SHARED_PORT = TRUE
AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST = TRUE
SHARED_PORT_PORT = $(COLLECTOR_PORT)
COLLECTOR_USES_SHARED_PORT = True
### Collector Setup
DAEMON_LIST = $(DAEMON_LIST), \
COLLECTOR
COLLECTOR.COLLECTOR_NAME = Collector test
# Ability for multiple sub-collectors on a single machine
COLLECTOR.NumProcesses = 4
# Request Handling (condor_status): use multiple cores
COLLECTOR.COLLECTOR_QUERY_WORKERS = $(DETECTED_CORES)/$(COLLECTOR.NumProcesses:1)
# Keep 100 MB Pool History file
COLLECTOR.KEEP_POOL_HISTORY = TRUE
COLLECTOR.POOL_HISTORY_DIR = $(LOG)
COLLECTOR.POOL_HISTORY_MAX_STORAGE = 100000000
## Run multiple sub-collectors on one node
# Be careful with network/shared port daemon
COLLECTOR_HOST = $(CONDOR_HOST):20000
UPDATE_COLLECTOR_WITH_TCP = True
USE_SHARED_PORT = False
COLLECTOR_USES_SHARED_PORT = False
COLLECTOR.USE_SHARED_PORT = False
COLLECTOR_ADDRESS_FILE =
NEGOTIATOR.USE_SHARED_PORT = True
SCHEDD.USE_SHARED_PORT = True
SHARED_PORT_ARGS = -p 9618
DAEMON_LIST = $(DAEMON_LIST), SHARED_PORT
# Forward ads to the main collector
# This value must not be defined via a complex macro; HTCondor expects a host-list and sends to host "IfThenElse"
CONDOR_VIEW_HOST = $(CentralManager)
COLLECTOR02 = $(COLLECTOR)
COLLECTOR03 = $(COLLECTOR)
COLLECTOR02_ARGS = -f -p 20002 -sock collector2 -local-name COLLECTOR02
COLLECTOR03_ARGS = -f -p 20003 -sock collector3 -local-name COLLECTOR03
# Set Logfile visible to Master
COLLECTOR02_LOG = $(LOG)/Collector02Log
COLLECTOR03_LOG = $(LOG)/Collector03Log
# Set log file visible to collector in the environment
COLLECTOR02_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/Collector02Log _condor_COLLECTOR_HOST=$(CONDOR_HOST):20002 _condor_COLLECTOR_NAME=Collector2"
COLLECTOR03_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/Collector03Log _condor_COLLECTOR_HOST=$(CONDOR_HOST):20003 _condor_COLLECTOR_NAME=Collector3"
DAEMON_LIST = $(DAEMON_LIST), \
COLLECTOR02, \
COLLECTOR03, \
---------------------------------------
The examples/wiki entries I've linked, etc. tell me to use
USE_SHARED_PORT = True
COLLECTOR.USE_SHARED_PORT = False
which should result in all the collectors ignoring the shared port.
I toyed around a lot with assigning socket names, different/no ports to
$COLLECTOR_HOST, different values for COLLECTOR_USES_SHARED_PORT, etc.
and it essentially comes down to a single problem:
The moment I set USE_SHARED_PORT = True all others ports except 9618 are
closed and.
Can somebody point me in a direction what I'm doing wrong regarding the
combination of CCB + Shared Port together with multiple sub-collectors?
Thanks and best regards
Frank