[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] CondorCE to Condor authz question [condor 24.3.0, htcondor-ce 24.0.2]



Hi Thomas,

This doesnât address your token problem, but the auth problem you are seeing is not CE-Schedd to LRMS-Schedd but rather CE-Schedd/JobRouter to LRMS-Collector. Since the CE-Schedd and LRMS-Collector do *not* run on the same machine, they cannot auth via FS.

Even if the CE-Schedd and LRMS-Schedd run on the same machine, the CE-Schedd does not actually know that. It asks the LRMS-Collector for the address of the LRMS-Schedd, then connects to that address.

Cheers,
Max

> On 18. Mar 2025, at 15:19, Thomas Hartmann <thomas.hartmann@xxxxxxx> wrote:
> 
> Hi all,
> 
> I encountered an odd(?) issue with our preproduction CondorCE where the submission from the CE to the Condor sched failed with FS (and other authentication methods).
> 
> I.e., I had reconfigured/re-set up our preproduction cluster. Submission from the Condor sched to the collector/negotiator worked. But jobs submitted to the CE (via SSL) failed as the CE sched could not submit them to the Condor sched. According to the CE route and the Condor sched the authentication failed including FS (whereas both are running on the same node and have both access to the same /tmp, so no elaborate unit isolation or so) [1.ce,1.condor]
> 
> All daemons on the CE or the central manager have FS as first authentication method followed by TOKEN etc. [2]
> Daemon-to-daemon is secured by idtokens (encrypted password is still rolled out due to legacy, but should not get picked up). The Condor token got rolled out for both, the CE and the Condor sched [3]
> 
> So, I would have assumed that the CE should be able to submit to the Condor sched with FS using /tmp/..., with the Condor sched further submitting the job via token authentication - which did not work,
> 
> Only later, I noticed that the CE complained about the token ownership, i.e., it was onwed by the `condor` user and the CE expected it to be owned by `root` [4]. After I re-owned the CE's token file, the submission from the CE to the Condor sched worked.
> This let to a bit odd (?) state where the the token file for the CE is owned by `root` and the same for the Condor sched is owned by `condor` [4].
> 
> While it works, I am a bit curious why the FS submission failed and why the ownership needs to be `root` for the CE? Maybe somebody has an idea?
> 
> Installed versions are as [5].
> 
> Cheers,
>  Thomas
> 
> 
> 
> [1.ce]
> >/var/log/condor-ce/JobRouterLog
> 
> 03/17/25 14:38:11 SECMAN: required authentication with collector at <131.169.223.129:9618> failed, so aborting command QUERY_SCHEDD_ADS.
> 03/17/25 14:38:11 ERROR: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using SSL|AUTHENTICATE:1004:Failed to authenticate using SCITOKENS|AUTHENTICATE:1004:Failed to authenticate using IDTOKENS|AUTHENTICATE:1004:Failed to authenticate using FS
> 03/17/25 14:38:11 ERROR (schedd grid-htc-preprod-ce01.desy.de at pool grid-htc-preprod-master01.desy.de:9618) Can't find address of schedd
> 03/17/25 14:38:11 JobRouter failure (src=1.0,route=Condor_Pool): failed to submit job
> 
> [1.condor]
> > /var/log/condor/SchedLog
> 03/17/25 16:18:53 (pid:3146106) (D_SECURITY) AUTHENTICATE: will try to use 4 (FS)
> 03/17/25 16:18:53 (pid:3146106) (D_SECURITY) AUTHENTICATE: do_authenticate is 1.
> 03/17/25 16:18:53 (pid:3146106) (D_SECURITY) AUTHENTICATE_FS: used dir /tmp/FS_XXXlaUnc7, status: 0
> 03/17/25 16:18:53 (pid:3146106) (D_SECURITY) AUTHENTICATE: method 4 (FS) failed.
> 
> [2]
> [root@grid-htc-preprod-master01 condor]#  condor_config_val SEC_CLIENT_AUTHENTICATION_METHODS SEC_DEFAULT_AUTHENTICATION_METHODS
> Not defined: SEC_CLIENT_AUTHENTICATION_METHODS
> FS,IDTOKENS,KERBEROS,SCITOKENS,SSL
> 
> [root@grid-htc-preprod-ce01 condor-ce]# condor_config_val SEC_CLIENT_AUTHENTICATION_METHODS SEC_DEFAULT_AUTHENTICATION_METHODS
> Not defined: SEC_CLIENT_AUTHENTICATION_METHODS
> FS,IDTOKENS,KERBEROS,SCITOKENS,SSL
> [root@grid-htc-preprod-ce01 ~]# condor_ce_config_val SEC_CLIENT_AUTHENTICATION_METHODS SEC_DEFAULT_AUTHENTICATION_METHODS
> FS, TOKEN, SCITOKENS, SSL
> FS
> 
> 
> [3]
> [root@grid-htc-preprod-ce01 ~]# md5sum /etc/condor-ce/tokens.d/accesspoint-condorce-grid /etc/condor/tokens.d/accesspoint-condorce-grid
> 035b5c1a4aea14f63bbd1d67b355edb3 /etc/condor-ce/tokens.d/accesspoint-condorce-grid
> 035b5c1a4aea14f63bbd1d67b355edb3 /etc/condor/tokens.d/accesspoint-condorce-grid
> 
> 
> [4]
> 03/18/25 13:47:08 ERROR: read_secure_file(/etc/condor-ce/tokens.d/accesspoint-condorce-grid): file must be owned by uid 0, was uid 25411
> 
> [root@grid-htc-preprod-ce01 ~]# ls -hall /etc/condor-ce/tokens.d/accesspoint-condorce-grid /etc/condor/tokens.d/accesspoint-condorce-grid
> -rw-------. 1 root   root   724 Mar 17 16:17 /etc/condor-ce/tokens.d/accesspoint-condorce-grid
> -rw-------. 1 condor condor 724 Mar 17 16:17 /etc/condor/tokens.d/accesspoint-condorce-grid
> 
> 
> [5]
> condor-24.3.0-1.el9.x86_64
> condor-placeholder-0.0.0-0.el9.noarch
> condor-upgrade-checks-23.10.20-1.el9.x86_64
> htcondor-ce-24.0.2-1.el9.noarch
> htcondor-ce-bdii-24.0.2-1.el9.noarch
> htcondor-ce-client-24.0.2-1.el9.noarch
> htcondor-ce-condor-24.0.2-1.el9.noarch
> python3-condor-24.3.0-1.el9.x86_64
> 
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> 
> Join us in June at Throughput Computing 25: https://osg-htc.org/htc25
> 
> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/

Attachment: smime.p7s
Description: S/MIME cryptographic signature