Hi, After some tests on selected nodes end last year, I’ve upgraded our flock to the most recent release of HTCondor. So far, I am still testing on few nodes while leaving the rest of the flock intact. I could setup a base flock as described in the manual at
https://htcondor.readthedocs.io/en/latest/getting-htcondor/admin-quick-start.html and submit a working job. Now, my current concern is to replicate the High Availability Daemon CMs setup we’ve had for the last 8 years on a restricted number of nodes.
I’ve setup new test CMs on nodes A and B and I am attempting to have them talk to each other before I add an AP and a Execute node. I’ve installed CM A and CM B using the script over
https://get.htcondor.org/ with the central manager role and the same password for each (A is setup with A as a CM and B is setup with B as a CM). I’ve added a file named 99-spc-cm.config to /etc/condor/config.d/ on each CM with the following content: DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, HAD, REPLICATION NEGOTIATOR_HOST = CONDOR_HOST = CENTRAL_MANAGER1 = A CENTRAL_MANAGER2 = B COLLECTOR_HOST = $(CENTRAL_MANAGER1),$(CENTRAL_MANAGER2) HAD_USE_PRIMARY = TRUE HAD_USE_REPLICATION = TRUE I’ve copied IDTokens A to B (/etc/condor/token.id) and IDTokens B to A (same spot) and checked permission where OK but so far the machines does not appear to be able to talk to each other (see below). I also tried to generate tokens manually on each machine for the opposite CM instead of copying files around, with the same result. Upon reconfiguration, on A, in /var/log/condor/MasterLog, I got errors similar to: 01/12/24 12:07:15 SECMAN: FAILED: Received "DENIED" from server for user condor@A using method IDTOKENS. 01/12/24 12:07:15 ERROR: SECMAN:2010:Received "DENIED" from server for user condor@A using method IDTOKENS. 01/12/24 12:07:15 Failed to start non-blocking update to B 01/12/24 12:07:16 Token requested not yet approved; please ask collector B admin to approve request ID 9651637. With some similar errors in logs on B. If I go on B and do: $ condor_token_request_approve Remote daemon has no request to approve. $ condor_token_request_approve -reqid 9651637 Remote daemon did not provide information for request ID 9651637. I am not quite sure where to go from here. Thanks for your help. –––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––-
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––- As part of our emissions reduction strategy, please only print this email if necessary |