TJ, I attempted to setup your solution but I am running into issues with the CREDD not being located on CM02. The configuration iformation for the schedd (CM01) is: CONDOR_HOST = $(FULL_HOSTNAME) DAEMON_LIST = MASTER COLLECTOR NEGOTIATOR SCHEDD COLLECTOR = $(SBIN)/condor_collector.exe NEGOTIATOR = $(SBIN)/condor_negotiator.exe CREDD_HOST = CM02.XXXXXXXX.com CREDD_CACHE_LOCALLY = True This was the SchedLog output with D_ALL when trying to run a job: 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) New Daemon obj (credd) name: "NULL", pool: "NULL", addr: "NULL" 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) No name given, but CREDD_HOST defined to "CM02.XXXXXXXX.com" 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Finding proper daemon name for "CM02.XXXXXXXX.com" 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Daemon name contains no '@', treating as a regular hostname 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Returning daemon name: "CM02.XXXXXXXX.com" 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Using "CM02.XXXXXXXX.com" for name in Daemon object 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Using "CM02.XXXXXXXX.com" for full hostname in Daemon object 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Local daemon name would be "CM01.XXXXXXXX.com" 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) COLLECTOR_HOST is set to "CM01.XXXXXXXX.com" 09/07/21 15:36:19 (fd:5) (pid:92) (D_DAEMONCORE) *** TIMEOUT_MULTIPLIER :: 0 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Checking if CM01.XXXXXXXX.com is a sinful address 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) CM01.XXXXXXXX.com is not a sinful address: does not begin with "<" 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) New Daemon obj (collector) name: "CM01.XXXXXXXX.com", pool: "NULL", addr: "NULL" 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Using name "CM01.XXXXXXXX.com" to find daemon 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Port not specified, using default (9618) 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Host info "CM01.XXXXXXXX.com" is a hostname, finding IP address 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) DNS returned: 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) fe80::a4ff:5e4c:bb0:ea3a 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) 10.1.22.53 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) We returned: 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) 10.1.22.53 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) fe80::a4ff:5e4c:bb0:ea3a 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Found IP address and port <10.1.22.53:9618> 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Daemon client (collector) address determined: name: "CM01.XXXXXXXX.com", pool: "CM01.XXXXXXXX.com", alias: "CM01.XXXXXXXX.com", addr: "<10.1.22.53:9618>" 09/07/21 15:36:19 (fd:5) (pid:92) (D_DAEMONCORE) *** TIMEOUT_MULTIPLIER :: 0 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Checking if <10.1.22.53:9618> is a sinful address 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) <10.1.22.53:9618> is a sinful address! 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Daemon client (collector) address determined: name: "NULL", pool: "NULL", alias: "NULL", addr: "<10.1.22.53:9618>" 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) New Daemon obj (collector) name: "NULL", pool: "NULL", addr: "<10.1.22.53:9618>" 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Checking if <10.1.22.53:9618> is a sinful address 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) <10.1.22.53:9618> is a sinful address! 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Checking if <10.1.22.53:9618> is a sinful address 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) <10.1.22.53:9618> is a sinful address! 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Already have address, no info to locate 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Address "<10.1.22.53:9618>" specified but no name, looking up host info 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) DNS returned: 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) fe80::a4ff:5e4c:bb0:ea3a 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) 10.1.22.53 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) We returned: 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) 10.1.22.53 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) fe80::a4ff:5e4c:bb0:ea3a 09/07/21 15:36:19 (fd:5) (pid:92) (D_SECURITY) IPVERIFY: for CM01.XXXXXXXX.com matched 10.1.22.53 to 10.1.22.53 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Querying collector <10.1.22.53:9618> (CM01.XXXXXXXX.com) with classad: LocationQuery = "CM02.XXXXXXXX.com" Projection = "CondorVersion CondorPlatform MyAddress AddressV1 Name Machine" TargetType = "CredD" LimitResults = 1 MyType = "Query" Requirements = ((Name == "CM02.XXXXXXXX.com")) 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) --- End of Query ClassAd --- 09/07/21 15:36:19 (fd:5) (pid:92) (D_COMMAND) Daemon::startCommand(QUERY_ANY_ADS,...) making connection to <10.1.22.53:9618> 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Guess address string for host = <10.1.22.53:9618>, port = 0 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) it was sinful string. ip = 10.1.22.53, port = 9618 09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) get_port_range - not checking LOWPORT, HIGHPORT for outgoing connection on Windows. 09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) CONNECT bound to <10.1.22.53:60022> fd=128 peer=<10.1.22.53:9618> 09/07/21 15:36:19 (fd:5) (pid:92) (D_SECURITY) SECMAN: command 48 QUERY_ANY_ADS to collector at <10.1.22.53:9618> (CM01.XXXXXXXX.com) from TCP port 60022 (blocking). 09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) condor_write(fd=128 collector at <10.1.22.53:9618> (CM01.XXXXXXXX.com),,size=554,timeout=60,flags=0,non_blocking=0) 09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) condor_read(fd=128 collector at <10.1.22.53:9618> (CM01.XXXXXXXX.com),,size=5,timeout=60,flags=0,non_blocking=0) 09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) condor_read(fd=128 collector at <10.1.22.53:9618> (CM01.XXXXXXXX.com),,size=280,timeout=60,flags=0,non_blocking=0) 09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) condor_read(fd=128 collector at <10.1.22.53:9618> (CM01.XXXXXXXX.com),,size=5,timeout=60,flags=0,non_blocking=0) 09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) condor_read(fd=128 collector at <10.1.22.53:9618> (CM01.XXXXXXXX.com),,size=187,timeout=60,flags=0,non_blocking=0) 09/07/21 15:36:19 (fd:5) (pid:92) (D_SECURITY) SECMAN: added session CM01:3372:1631054179:21 to cache for 86400 seconds (3600s lease). 09/07/21 15:36:19 (fd:5) (pid:92) (D_SECURITY) SECMAN: startCommand succeeded. 09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) condor_write(fd=128 collector at <10.1.22.53:9618> (CM01.XXXXXXXX.com),,size=274,timeout=60,flags=0,non_blocking=0) 09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) condor_read(fd=128 collector at <10.1.22.53:9618> (CM01.XXXXXXXX.com),,size=5,timeout=60,flags=0,non_blocking=0) 09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) condor_read(fd=128 collector at <10.1.22.53:9618> (CM01.XXXXXXXX.com),,size=8,timeout=60,flags=0,non_blocking=0) 09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) CLOSE TCP <10.1.22.53:60022> fd=128 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Destroying Daemon object: 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Type: 5 (collector), Name: (null), Addr: <10.1.22.53:9618> 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) FullHost: CM01.XXXXXXXX.com, Host: CM01, Pool: (null), Port: 9618 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) IsLocal: N, IdStr: collector at <10.1.22.53:9618> (CM01.XXXXXXXX.com), Error: (null) 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) --- End of Daemon object info --- 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Destroying Daemon object: 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Type: 5 (collector), Name: CM01.XXXXXXXX.com, Addr: <10.1.22.53:9618> 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) FullHost: CM01.XXXXXXXX.com, Host: CM01, Pool: CM01.XXXXXXXX.com, Port: 9618 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) IsLocal: N, IdStr: (null), Error: (null) 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) --- End of Daemon object info --- 09/07/21 15:36:19 (fd:5) (pid:92) (D_ALWAYS) Can't find address for credd CM02.XXXXXXXX.com 09/07/21 15:36:19 (fd:5) (pid:92) (D_COMMAND) Daemon::startCommand(CREDD_GET_PASSWD,...) making connection to NULL 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Destroying Daemon object: 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Type: 13 (credd), Name: CM02.XXXXXXXX.com, Addr: (null) 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) FullHost: CM02.XXXXXXXX.com, Host: (null), Pool: (null), Port: -1 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) IsLocal: N, IdStr: (null), Error: Can't find address for credd CM02.XXXXXXXX.com 09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) --- End of Daemon object info --- 09/07/21 15:36:19 (fd:5) (pid:92) (D_ALWAYS) ERROR: Could not locate valid credential for user 'lpalmer@XXXXXXXX' Have I missed something in the configuration? Is it the CREDD_PORT you were referring to that I need to add? Thanks, Lachlan From: Lachlan Palmer Cheers TJ and Greg Both of these options are great. From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of Hitchen, Greg (IM&T, Kensington WA) Hi Lachlan We have 9 separate pools of windows execute nodes, each with linux central managers. We have all our submit nodes in one of those pools. We also have a standalone windows credd machine in that same pool. So that all windows execute nodes in all the pools can also see the credd machine, it’s configuration for CONDOR_HOST points to multiple pools: CONDOR_HOST = pool1.xxx.xxx, pool2.xxx.xxx, pool2.xxx.xxx, etc. That way the central managers in every pool will know about the credd machine. I will point out that we have not gone production on this yet, but we have tested it and everything seems to work OK. Cheers Greg From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of John M Knoeller There is no particular need to have the condor_credd running on the same machine as the condor_collector. The central manager does not need to know about the Credd at all. The Schedd and the execute nodes need to know how to locate it, but the central manager does not. If you have only a single Schedd, you might consider running a single condor_credd on that machine. Otherwise you can run the condor_credd on any machine you choose. If you have a domain controller or active directory, you might consider running the condor_credd on that machine. You just need to set the CREDD_HOST configuration variable on all of the Schedd and Execute nodes to point to the machine where the condor_credd is running. If you use a dedicated
CREDD_PORT, make sure to include that in the value of the CREDD_HOST -tj From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Lachlan Palmer <LPalmer@xxxxxxxxxxxx> Hi All, I am running into issues with running jobs in different pools. We have three pools of Windows machines with their own central manager running a condor_credd daemon. Everything works fine when submitting jobs within the
pool the submit node is in but when you launch jobs to another pool then there is a match failure on the job’s LocalCredd pointing to the submit node’s central manager while the LocalCredd for the execute nodes in the other pool being that pool’s central manager. What is the recommended configuration in this case? Should we just pick one of the pools central manager to be the sole condor_credd? What is the appropriate way to configure the other central managers to point to this
credd? Is it simply the same as for the submit and execute config files? For more information here is our condor_credd config lines for a central manager (CM01): ## CREDD logging settings ## Customize these if you wish. CREDD_LOG = $(LOG)/CreddLog CREDD_DEBUG = D_COMMAND MAX_CREDD_LOG = 50000000 # Timeout session quickly since we normally only get contacted # once per starter SEC_CREDD_SESSION_TIMEOUT = 10 # Set security settings so that full security to the credd is required CREDD.SEC_DEFAULT_AUTHENTICATION = REQUIRED CREDD.SEC_DEFAULT_ENCRYPTION = REQUIRED CREDD.SEC_DEFAULT_INTEGRITY = REQUIRED CREDD.SEC_DEFAULT_NEGOTIATION = REQUIRED
# Require PASSWORD auth for password fetching CREDD.SEC_DAEMON_AUTHENTICATION_METHODS = PASSWORD # Only honor password fetch requests to the trusted "condor_pool" user CREDD.ALLOW_DAEMON = condor_pool@$(UID_DOMAIN) # Require NTSSPI for storing credentials CREDD.SEC_DEFAULT_AUTHENTICATION_METHODS = NTSSPI CREDD_HOST = $(CONDOR_HOST) CREDD_CACHE_LOCALLY = True And for the execute and submit config: CREDD_HOST = CM01.XXXXXXX.com CREDD_CACHE_LOCALLY = True STARTER_ALLOW_RUNAS_OWNER = True Thanks, Lachlan This communication (both the message and any attachments or links) is confidential and only intended for the use of the person or persons to whom it is addressed unless we have expressly
authorized otherwise. It also may contain information that is protected by solicitor-client privilege. If you are reading this communication and are not an addressee or authorized representative of an addressee, we hereby notify you that any distribution,
copying or other use of it without our express authorization is strictly prohibited. If you have received this communication in error, please delete both the message and any attachments from your system and notify us immediately by e-mail or phone. In addition,
we note that this communication and its transmission of data have not been secured by encryption. Therefore, we are not able to confirm or guarantee that the communication has not been intercepted, amended, or read by an unintended third party.
This communication (both the message and any attachments or links) is confidential and only intended for the use of the person or persons to whom it is addressed unless we have expressly authorized otherwise. It also may contain information that is protected by solicitor-client privilege. If you are reading this communication and are not an addressee or authorized representative of an addressee, we hereby notify you that any distribution, copying or other use of it without our express authorization is strictly prohibited. If you have received this communication in error, please delete both the message and any attachments from your system and notify us immediately by e-mail or phone. In addition, we note that this communication and its transmission of data have not been secured by encryption. Therefore, we are not able to confirm or guarantee that the communication has not been intercepted, amended, or read by an unintended third party. |