[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_ssh_to_job & (remote) DAG



Hi Ben,

The problem is that when the condor_dagman process invokes the job submission code, the security session generated is for user "bejones@fsauth" whereas the remote submission from condor_submit (or connection as condor_ssh_to_job) is authenticating as "bejones@xxxxxxx".  The two strings are not the same hence they are a different owner as far as the AP is concerned -- and only the owner can ssh_to_job.

So, what's the difference?
- Potentially subtle change in the effective config, making the FS security come before KERBEROS.
 - For example, if DAG is starting to use direct submission instead of forking condor_submit then the evaluated value of SEC_CLIENT_AUTHENTICATION_METHODS could change.  For example:

DAGMAN.SEC_CLIENT_AUTHENTICATION_METHODS = FS
SEC_CLIENT_AUTHENTICATION_METHODS = KERBEROS

looks like it would cause the authentication to change depending on whether DAGMan does direct submission or not.

- Something subtle in the dagman runtime environment in the scheduler universe breaking Kerberos-based auth, causing the fallback to FS.

Know what would help?  Could you send a DAGMan log with D_SECURITY:2?

In general, I strongly suggest the same "user" identifier to result regardless of what authentication method is used.  We tend to have subtle assumptions based on the identity not changing...

Brian

> On Jul 15, 2025, at 12:32âPM, Ben Jones via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
> 
> 
> 
>> On 15 Jul 2025, at 17:46, Greg Thain via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
>> 
>> On 7/15/25 07:56, Ben Jones wrote:
>>> Hi,
>>> 
>>> Since at least 24.0.7, we have a problem with condor_ssh_to_job to jobs that have been submitted via dag. 
>>> 
>>> We get errors like this:
>>> 
>>> $ condor_ssh_to_job 915.0
>>> bejones is not authorized for access to the starter for job 915.0
>>> 
>>> Our submissions are always remote, and from the log it seems as though the reason is because the permission check includes the method, ie:
>>> 
>>> (D_ALWAYS:2) OwnerCheck reject, 'bejones@xxxxxxx' not ad owner: 'bejones@fsauth' UID_DOMAIN=cern.ch
>> 
>> 
>> Ben:
>> Just to be clear, are you running condor_ssh_to_job from the same machine you run "condor_submit -remote" from?  Or on AP with the schedd running on it?
> 
> Hi Greg,
> 
> Not condor_submit -remote, plain old condor_submit, though with SCHEDD_HOST & CREDD_HOST set.
> 
> But otherwise, yes, condor_ssh_to_job is being run from a remote machine, and happens to be the same remote machine that I submitted from. This doesnât matter though (in the working case):
> 
> [bejones@lxplus9105 condor]$ export _condor_CONDOR_HOST="sleepybird02.cern.ch"
> export _condor_SCHEDD_HOST="babybird01.cern.ch"
> export _condor_CREDD_HOST="babybird01.cern.ch"
> [bejones@lxplus9105 condor]$ condor_q -const 'JobUniverse =?= 5'  -af 'join(".", ClusterId, ProcId)' DAGNodeName
> 921.0 undefined
> 923.0 A
> [bejones@lxplus9105 condor]$ condor_ssh_to_job 921.0
> Welcome to slot1_1@xxxxxxxxxxxxxxxxxxxx!
> Your condor job is running with pid(s) 3093115.
> [bejones@b9jantest662 dir_3093113]$
> 
> From another (also remote):
> [bejones@aiadm02 ~]$ condor_ssh_to_job 921.0
> Welcome to slot1_1@xxxxxxxxxxxxxxxxxxxx!
> Your condor job is running with pid(s) 3093115 3093425.
> [bejones@b9jantest662 dir_3093113]$
> 
> The DAG one:
> 
> [bejones@lxplus9105 condor]$ condor_ssh_to_job 923.0
> bejones is not authorized for access to the starter for job 923.0
> 
> And for completeness:
> 
> [bejones@aiadm02 ~]$ condor_ssh_to_job 923.0
> bejones is not authorized for access to the starter for job 923.0
> 
> Whilst Iâm at it, and I thought this was something else, but condor_ssh_to_job by queue super user doesnât seem to work now either.
> 
> This is from the schedd, and for both types it fails:
> 
> [root@babybird01 ~]# condor_ssh_to_job 921.0
> condor is not authorized for access to the starter for job 921.0
> [root@babybird01 ~]# condor_ssh_to_job 923.0
> condor is not authorized for access to the starter for job 923.0
> 
> Maybe relevant stuff from config:
> 
> [root@babybird01 ~]# condor_config_val QUEUE_SUPER_USERS
> root, condor
> [root@babybird01 ~]# condor_config_val -dump | grep SSH
> CONDOR_SSH_TO_JOB_FAKE_PASSWD_ENTRY = true
> ENABLE_SSH_TO_JOB = True
> OPSYSSHORTNAME = RedHat
> SCHEDD_ENABLE_SSH_TO_JOB = True
> SSH_KEYGEN =
> SSH_KEYGEN_ARGS =
> SSH_TO_JOB_SSHD_CONFIG_TEMPLATE = /etc/condor/condor_ssh_to_job_sshd_config_template
> SSHD =
> SSHD_ARGS =
> 
> cheers,
> Ben
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> 
> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/