I am replying to this thread because I have debugged a few things that are most certainly related, but I still haven't solved my problem. First step was increasing the logging, I thought I had it higher than I did, but I went up to D_SECURITY:3. After doing this I found out condor was failing DNS lookups for other machines because it was using the wrong interface, so the machines were unable to match their domain names with their allow-list. Then I fixed an issue I was having with getting my machines to spit up FQDNs instead of normal domain names via the DEFAULT_DOMAIN_NAME macro. After this, it seemed like rolling with Kerberos authentication would be a better fit, but I still can't get my machines to authenticate with each other, nor is Condor able to authenticate any of my domain users. I updated my security config to look like the following: ============================================== @use SECURITY : Strong SEC_DEFAULT_AUTHENTICATION_METHODS = KERBEROS ALLOW_READ = */* ALLOW_WRITE = */* ALLOW_ADMINISTRATOR = condor-admin*/* ALLOW_CONFIG = condor-admin*/* ALLOW_NEGOTIATOR = condor*/submit1* ALLOW_DAEMON = condor*/* ============================================== condor-admin is a valid domain user, and submit1 is where my condor_schedd daemon lives. But when I fire up condor, from my understanding for some reason my schedd daemon is sending the following classad to try and authenticate with the manager: ================================================================================================ ServerCommandSock = "<192.168.0.68:9618?addrs=192.168.0.68-9618&noUDP&sock=3949_4396_3>" Enact = "YES" Subsystem = "SCHEDD" ParentUniqueID = "submit1:3949:1594324900" TriedAuthentication = true Integrity = "YES" ServerPid = 3988 Encryption = "YES" Authentication = "NO" RemoteVersion = "$CondorVersion: 8.8.9 May 07 2020 BuildID: 503236 PackageID: 8.8.9-1 FIPS $" SessionLease = 3600 OutgoingNegotiation = "REQUIRED" User = "condor@parent" UseSession = "YES" CryptoMethods = "3DES" Sid = "3e7fbe4351131b1ebe8437b870ffb34994c8a91b8ba1e0f9" ValidCommands = "60000,60008,60026,60017,60004,60012,60021,60043,60007,457,60020,60044" Command = 60008 SessionDuration = "86400" AuthMethods = "PASSWORD" ==================================================================================================== Which is throwing me for a loop, because PASSWORD is not listed as an authentication method in my security config. My manager node is sending back the following response: ==================================================================================================== Encryption = "YES" Integrity = "YES" AuthMethodsList = "" CryptoMethods = "3DES,BLOWFISH" Authentication = "YES" SessionDuration = "86400" SessionLease = 3600 RemoteVersion = "$CondorVersion: 8.8.9 May 07 2020 BuildID: 503236 PackageID: 8.8.9-1 FIPS $" Enact = "YES" ===================================================================================================== Which seems to suggest it isn't finding any authentication methods in common. But even then, when I switched from KERBEROS to PASSWORD authentication, when I try to run condor_q from the user condor-admin on my machine with schedd, I see the following appear in the logfile: ================================================================================================================ 07/09/20 13:58:50 DC_AUTHENTICATE: authentication of <192.168.0.68:13883> did not result in a valid mapped user name, which is required for this command (519 QUERY_JOB_ADS_WITH_AUTH), so aborting. ================================================================================================================ Which I don't think makes sense, because that username would match with the rule I have in my config, wouldn't it? Is there anything here which is standing out as something I can investigate further? I seem to be a little stuck in the water. Thanks all, Wes Wesley Taylor â Cluster Manager Numerica Corporation (www.numerica.us) 5042 Technology Parkway #100 Fort Collins, Colorado 80528 âï (970) 207 2233 ð wesley.taylor@xxxxxxxxxxx Public Content -----Original Message----- From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Wesley Taylor Sent: Tuesday, July 7, 2020 6:34 PM To: 'htcondor-users@xxxxxxxxxxx' <htcondor-users@xxxxxxxxxxx> Subject: [External] - [HTCondor-users] Help with authentication and condor mapfile for strong security CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi all, I had my Condors hissing and being silent as they should, but then I enabled the Strong security template and as expected, everything stopped working. I read through the HTCondor documentation with regards to security in its entirety located at: https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fhtcondor.readthedocs.io%2Fen%2Fstable%2Fadmin-manual%2Fsecurity.html%3Fhighlight%3Dmapfile%23security&data=02%7C01%7C%7C63920a8476584554f26d08d822d6d298%7Cfae7a2aedf1d444e91bebabb0900b9c2%7C0%7C0%7C637297653345981231&sdata=%2BFcaI9lWYxS7LEVqqUcqNHdTRW%2FP367le9jZuUCTjgY%3D&reserved=0 but I still have a few questions: 1. If I am using realmd to configure Kerberos and sssd to work with an Active Directory server, how do I configure Active Directory to have appropriate properties so that I can use Kerberos authentication with HTCondor? 2. How can I verify my HTCondor mapfile is correct? It appears below that my condor_schedd is unable to authenticate with the shared port because there is no mapped uid, but based on the documentation, I am a little fuzzy on how to make a correct mapping for my condor_schedd. Security config: =================================================== @use SECURITY : Strong SEC_PASSWORD_FILE = /etc/condor/passwords.d/POOL SEC_DEFAULT_AUTHENTICATION_METHODS = PASSWORD ALLOW_DAEMON = * ALLOW_NEGOTIATOR = * =================================================== SchedLog: =================================================================================================================================================================================================== 07/02/20 19:16:19 ****************************************************** 07/02/20 19:16:19 ** condor_schedd (CONDOR_SCHEDD) STARTING UP 07/02/20 19:16:19 ** /usr/sbin/condor_schedd 07/02/20 19:16:19 ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1) 07/02/20 19:16:19 ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON 07/02/20 19:16:19 ** $CondorVersion: 8.8.9 May 07 2020 BuildID: 503236 PackageID: 8.8.9-1 FIPS $ 07/02/20 19:16:19 ** $CondorPlatform: x86_64_CentOS7 $ 07/02/20 19:16:19 ** PID = 24136 07/02/20 19:16:19 ** Log last touched time unavailable (No such file or directory) 07/02/20 19:16:19 ****************************************************** 07/02/20 19:16:19 Using config source: /etc/condor/condor_config 07/02/20 19:16:19 Using local config sources: 07/02/20 19:16:19 /etc/condor/config.d/49-common 07/02/20 19:16:19 /etc/condor/config.d/50-security 07/02/20 19:16:19 /etc/condor/config.d/51-role-exec 07/02/20 19:16:19 /etc/condor/condor_config.local 07/02/20 19:16:19 config Macros = 71, Sorted = 71, StringBytes = 1922, TablesBytes = 2620 07/02/20 19:16:19 CLASSAD_CACHING is ENABLED 07/02/20 19:16:19 Daemon Log is logging: D_ALWAYS D_ERROR 07/02/20 19:16:19 SharedPortEndpoint: waiting for connections to named socket 24123_f333_3 07/02/20 19:16:19 DaemonCore: command socket at <172.20.0.56:9618?addrs=172.20.0.56-9618&noUDP&sock=24123_f333_3> 07/02/20 19:16:19 DaemonCore: private command socket at <172.20.0.56:9618?addrs=172.20.0.56-9618&noUDP&sock=24123_f333_3> 07/02/20 19:16:19 History file rotation is enabled. 07/02/20 19:16:19 Maximum history file size is: 20971520 bytes 07/02/20 19:16:19 Number of rotated history files is: 2 07/02/20 19:16:19 my_popenv: Failed to exec in child, errno=2 (No such file or directory) 07/02/20 19:16:19 Failed to execute /usr/sbin/condor_shadow.std, ignoring 07/02/20 19:16:19 Reloading job factories 07/02/20 19:16:19 Loaded 0 job factories, 0 were paused, 0 failed to load 07/02/20 19:16:25 TransferQueueManager stats: active up=0/100 down=0/100; waiting up=0 down=0; wait time up=0s down=0s 07/02/20 19:16:25 TransferQueueManager upload 1m I/O load: 0 bytes/s 0.000 disk load 0.000 net load 07/02/20 19:16:25 TransferQueueManager download 1m I/O load: 0 bytes/s 0.000 disk load 0.000 net load 07/02/20 19:16:51 DC_AUTHENTICATE: authentication of <172.20.0.56:41253> did not result in a valid mapped user name, which is required for this command (519 QUERY_JOB_ADS_WITH_AUTH), so aborting. 07/02/20 19:16:51 DC_AUTHENTICATE: reason for authentication failure: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using PASSWORD =================================================================================================================================================================================================== Thank you all for the help as always, Wes Wesley Taylor â Cluster Manager Numerica Corporation (https://usg02.safelinks.protection.office365.us/?url=http%3A%2F%2Fwww.numerica.us%2F&data=02%7C01%7C%7C63920a8476584554f26d08d822d6d298%7Cfae7a2aedf1d444e91bebabb0900b9c2%7C0%7C0%7C637297653345981231&sdata=BteIaHgLTOzaRDl3glhh9Oott4Z8TOv0n%2BMHKYGj%2FuQ%3D&reserved=0) 5042 Technology Parkway #100 Fort Collins, Colorado 80528 âï (970) 207 2233 ð wesley.taylor@xxxxxxxxxxx Public Content
Attachment:
smime.p7s
Description: S/MIME cryptographic signature