If you are running a condor_credd on Node1, then you should only need to run "condor_store_cred add" once per user. Node2 should be using the condor_credd on Node1 to fetch credentials when it runs the jobs.
It sounds like this is failing, this is most likely a configuration issue. Lets start by looking at that.
Run
condor_config_val -dump cred
This should print out all of the configuration values that have cred in the name. Node2 should have a value for CREDD_HOST that refers to Node1, since that is where the condor_credd daemon is running.
If that is correct, you need to look in the c:\condor\log\CredLog file on Node1 to see if there are failure messages from Node2 trying to query a user credential when you run condor_submit, or when a job runs on Node2.
You could also run condor_submit in debug mode to see if there are failure messages on the client side when trying to talk to the condor_credd, just
add
-debug:D_COMMAND,D_SECURITY
to the command line when you run condor_submit to see log messages from the client side. You should see it try and query the condor_credd before submitting the job. You can then look in the CredLog for messages at the same time as your condor_submit command
to see if there are failures there.
-tj
From: Andy Barr <ajbarr@xxxxxxxxx>
Sent: Tuesday, November 19, 2024 7:42 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: John M Knoeller <johnkn@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] windows authentication ht-condor 24.1.1
Hi John,
Thank you for the reply. I have made some progress with my ht-condor Windows pool. I have 1 node as my central manager, submit, execute, and condor_credd node. I have a second node setup as an execute
and submit node. The nodes are in the same NT domain.
On both nodes, I can run
D:\condor_test>condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
slot1@xxxxxxxxxxxxxxxxx WINDOWS X86_64 Unclaimed Idle 0.000 65239 0+00:04:34
slot1@xxxxxxxxxxxxxxxxx WINDOWS X86_64 Unclaimed Idle 0.000 261782 0+00:00:00
Total Owner Claimed Unclaimed Matched Preempting Drain Backfill BkIdle
X86_64/WINDOWS 2 0 0 2 0 0 0 0 0
Total 2 0 0 2 0 0 0 0 0
=================================
I have set my pool password on both nodes, condor_store_cred add -c, then condor_reconfig -all
Next, on my central manager node 1, condor_store_cred add which works successfully and I can run the test sleep job that run on node1 and node2.
If however, I try to submit a job on my node2 (my execute and submit node)
D:\condor_test>condor_submit sleep.sub
ERROR: No credential stored for myusername@COMPANY
Correct this by running:
condor_store_cred add
D:\condor_test>condor_store_cred add
Account: myusername@COMPANY
CredType: password
Enter password:
Operation failed.
Make sure your ALLOW_WRITE setting includes this host.
On both nodes I get.
D:\condor_test>condor_config_val ALLOW_WRITE
*
==================
Another strange issue is that while working on setting up my condor pool, I created some jobs didn't run and are in the queue in status=HOLD. If I try to remove them using
condor_rm 2.0 I get,
D:\condor_test>condor_rm 2.0
Permission denied to remove job 2.0
==================
Last, I check my firewall log on both machines and none of the condor exe are getting blocked. I also don't see any errors in my condor logs.
Thanks so much for your time and help,
Andy
I would recommend you use NTSSPI or IDTOKEN. There is no reason to use PASSWORD if you are running HTCondor 24.
"host" based authentication one of those ideas from the Linux version of HTCondor that doesn't really apply
to the Windows version in quite the same way. When we say host based is bad, what we really mean is that its bad on Linux
becasue ordinary users can inpsersonate daemons if you use host based security policy.
but on Windows using NTSSPI a service on one machine can prove to a service on another machine that it is running as a service using the SYSTEM account. They can do that if both machines are in the same NT domain. So for machines in the same NT domain, NTSSPI
is the best choice for authenticating HTCondor daemons to each other. Also for authenticating users to the SCHEDD daemon, and for authenticating specific users to run administrative tools like condor_reconfig.
The MSI installer will put the user that ran the installer into the ALLOW_ADMINISTRATOR list by default, you need to add any others by hand. It will add identities for IDTOKEN and PASSWORD to ALLOW_DAEMON by default.
Now, if your machines are not in a common NT domain. the next best choice is IDTOKEN. IDTOKEN can do everything that PASSWORD can, and much more. If two machine have the same secret value in their c:\condor\ tokens.sk\POOL
file, then IDTOKEN behaves just like PASSWORD, allowing daemons that can read that file to authenticate to each other as either condor_pool or condor identity.
Remember that SEC_CLIENT_AUTHENTICATION_METHODS is used by users when they send commands, and by daemons when they act as the client talking to other daemons. you should never set SEC_CLIENT_AUTHENTICATION_METHODS that that to just PASSWORD, because tools do
not have access to the secret necessary to use PASSWORD authentication.
You should generally not need to change SEC_CLIENT_AUTHENTICATION_METHODS or any of the SEC_ knobs away from their default values.
The installer should leave condor_config with something like this in it. (johnkn the user that ran the installer)
...
use SECURITY : recommended_v9_0(SYSTEM, Administrator@*, johnkn@*)
##--------------------------------------------------------------------
## Settings from the the installer questions
##--------------------------------------------------------------------
INSTALL_USER = johnkn
...
You can edit condor_config.local to make changes to the security config to add ALLOW config for machines
and for additional administrators.
On the central manager, you need to add identifiers of the other machines in the pool to ALLOW_DAEMON like this
# use the DENIED messages as a guide to the machine names to add here
ALLOW_DAEMON = $(ALLOW_DAEMON) hostname$@company
# or just use a wildcard to match multiple hostnames
ALLOW_DAEMON = $(ALLOW_DAEMON) *name$@company
# give bob the ability to run admin commands like condor_off and condor_reconfig,
ALLOW_ADMINISTRATOR = $(ALLOW_ADMINISTRATOR) bob@company
-tj
Subject: [HTCondor-users] windows authentication ht-condor 24.1.1
Hi,
I am working on testing ht-condor 24.1.1 on a small network of Windows 11 workstations. These machines are in a secure environment and only authenticated users can access the machines. I need to be able to utilize the run as owner option to launch jobs as
the user who submitted them.
Is there a recommended authentication method I should use? I see Host-Based Security mentioned by HTCondor as less secure but I'm not sure how to setup
I tried to setup the recommended security method and did the following.
I successfully setup and ran ht-condor on 1 machine. I have condor_credd running on that machine and can successfully use the run as owner option on that machine. The machine with the initial setup is running, collector, credd, master, negotiation, procd,
schedd, shared_port,, and startd.
Now, I would like to install ht-condor on each users workstation with submit and execute roles so that users can submit jobs from their workstation and run on anyone's workstation in the pool.
So I installed ht-condor on a 2nd windows 11 workstation. While reading the documentation, I feel like I just need PASSWORD authentication and have created a pool password on both machines, condor_store_cred add -c
On my 1st machine (Central Manager) which works, condor_store_cred add, works fine.
On my 2nd submit execute machine I get, condor_store_cred add -c
Enter password:
Operation failed.
Make sure your ALLOW_WRITE setting includes this host.
I have tried to allow everything using * for most things,
ALLOW_ADMINISTRATOR = *
ALLOW_READ = *
ALLOW_WRITE = *
ALLOW_CLIENT = *
ALLOW_NEGOTIATOR = *
SEC_CONFIG_NEGOTIATION = REQUIRED
SEC_CONFIG_AUTHENTICATION = REQUIRED
SEC_CONFIG_ENCRYPTION = REQUIRED
SEC_CONFIG_INTEGRITY = REQUIRED
If I change to SEC_CLIENT_AUTHENTICATION_METHODS = PASSWORD then my 1st machine doesn't work.
So for now I am using
SEC_CLIENT_AUTHENTICATION_METHODS = NTSSPI, PASSWORD
but this gives me the following in the master log file on my 2nd computer.
11/07/24 14:01:38 SECMAN: FAILED: Received "DENIED" from server for user hostname$@company using method NTSSPI.
11/07/24 14:01:38 ERROR: SECMAN:2010:Received "DENIED" from server for user hostname$@company using method NTSSPI.
11/07/24 14:01:38 Failed to start non-blocking update to <ip addres of master:9618>.
I have through the config process process twice re-reading the documentation and spending 4 hours each time on it but still end up with the same issue.
Thanks for the help!
Andy
|