[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_ssh_to_job & (remote) DAG





On 15 Jul 2025, at 17:46, Greg Thain via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:

On 7/15/25 07:56, Ben Jones wrote:
Hi,

Since at least 24.0.7, we have a problem with condor_ssh_to_job to jobs that have been submitted via dag. 

We get errors like this:

$ condor_ssh_to_job 915.0
bejones is not authorized for access to the starter for job 915.0

Our submissions are always remote, and from the log it seems as though the reason is because the permission check includes the method, ie:

(D_ALWAYS:2) OwnerCheck reject, 'bejones@xxxxxxx' not ad owner: 'bejones@fsauth' UID_DOMAIN=cern.ch


Ben:

Just to be clear, are you running condor_ssh_to_job from the same machine you run "condor_submit -remote" from?  Or on AP with the schedd running on it?


Hi Greg,

Not condor_submit -remote, plain old condor_submit, though with SCHEDD_HOST & CREDD_HOST set.

But otherwise, yes, condor_ssh_to_job is being run from a remote machine, and happens to be the same remote machine that I submitted from. This doesnât matter though (in the working case):

[bejones@lxplus9105 condor]$ export _condor_CONDOR_HOST="sleepybird02.cern.ch"
export _condor_SCHEDD_HOST="babybird01.cern.ch"
export _condor_CREDD_HOST="babybird01.cern.ch"
[bejones@lxplus9105 condor]$ condor_q -const 'JobUniverse =?= 5'  -af 'join(".", ClusterId, ProcId)' DAGNodeName
921.0 undefined
923.0 A
[bejones@lxplus9105 condor]$ condor_ssh_to_job 921.0
Welcome to slot1_1@xxxxxxxxxxxxxxxxxxxx!
Your condor job is running with pid(s) 3093115.
[bejones@b9jantest662 dir_3093113]$

From another (also remote):
[bejones@aiadm02 ~]$ condor_ssh_to_job 921.0
Welcome to slot1_1@xxxxxxxxxxxxxxxxxxxx!
Your condor job is running with pid(s) 3093115 3093425.
[bejones@b9jantest662 dir_3093113]$

The DAG one:

[bejones@lxplus9105 condor]$ condor_ssh_to_job 923.0
bejones is not authorized for access to the starter for job 923.0

And for completeness:

[bejones@aiadm02 ~]$ condor_ssh_to_job 923.0
bejones is not authorized for access to the starter for job 923.0

Whilst Iâm at it, and I thought this was something else, but condor_ssh_to_job by queue super user doesnât seem to work now either.

This is from the schedd, and for both types it fails:

[root@babybird01 ~]# condor_ssh_to_job 921.0
condor is not authorized for access to the starter for job 921.0
[root@babybird01 ~]# condor_ssh_to_job 923.0
condor is not authorized for access to the starter for job 923.0

Maybe relevant stuff from config:

[root@babybird01 ~]# condor_config_val QUEUE_SUPER_USERS
root, condor
[root@babybird01 ~]# condor_config_val -dump | grep SSH
CONDOR_SSH_TO_JOB_FAKE_PASSWD_ENTRY = true
ENABLE_SSH_TO_JOB = True
OPSYSSHORTNAME = RedHat
SCHEDD_ENABLE_SSH_TO_JOB = True
SSH_KEYGEN =
SSH_KEYGEN_ARGS =
SSH_TO_JOB_SSHD_CONFIG_TEMPLATE = /etc/condor/condor_ssh_to_job_sshd_config_template
SSHD =
SSHD_ARGS =

cheers,
Ben