Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] condor_ssh_to_job & (remote) DAG
- Date: Wed, 16 Jul 2025 13:47:53 +0000
- From: "Bockelman, Brian" <BBockelman@xxxxxxxxxxxxx>
- Subject: Re: [HTCondor-users] condor_ssh_to_job & (remote) DAG
> On Jul 16, 2025, at 7:54âAM, Ben Jones <ben.dylan.jones@xxxxxxx> wrote:
>
> Hi Brian,
>
>>
>> Know what would help? Could you send a DAGMan log with D_SECURITY:2?
>
> Sure, Iâll generate and send one off-list.
The logs didn't have D_SECURITY:2 enabled but did have D_SECURITY turned on. That gives some hints but not 100%. Here's a relevant snippet from v23:
07/16/25 14:16:08 (fd:8) (pid:2712912) (D_SECURITY) SECMAN: Auth methods: FS,KERBEROS
07/16/25 14:16:08 (fd:8) (pid:2712912) (D_SECURITY) AUTHENTICATE: setting timeout for <ADDRESS> to 20.
07/16/25 14:16:08 (fd:8) (pid:2712912) (D_SECURITY) HANDSHAKE: in handshake(my_methods = 'FS,KERBEROS')
07/16/25 14:16:08 (fd:8) (pid:2712912) (D_SECURITY) HANDSHAKE: handshake() - i am the client
07/16/25 14:16:08 (fd:8) (pid:2712912) (D_SECURITY) HANDSHAKE: sending (methods == 68) to server
07/16/25 14:16:08 (fd:8) (pid:2712912) (D_SECURITY) HANDSHAKE: server replied (method = 4)
07/16/25 14:16:08 (fd:8) (pid:2712912) (D_SECURITY) AUTHENTICATE_FS: used dir /tmp/FS_XXX8oSwAQ, status: 1
v24 didn't have any of the expected security logs; not sure why.
Regardless, this shows that v23 was using FS auth to submit the jobs so probably, in v23, the "mismatch" occurred.
One messy area that's been a long time in cleaning up is the difference between the "Unix user" that the AP will use to read/write files for the job and the "Owner" of the job. It's been assumed that the Unix user can be found by simply cutting out everything before the "@" and then, internally, things have occasionally used the user when they really meant to use the owner.
There's been quite a bit of cleaning up here. At first blush, I might lean toward saying that v24 is doing the "right thing" because what v23 is doing is giving "someone else" (as defined by your config) SSH access to your job.
Greg, can you confirm?
>
>> In general, I strongly suggest the same "user" identifier to result regardless of what authentication method is used. We tend to have subtle assumptions based on the identity not changingâ
>
> I donât disagree, but we have forever had:
>
> KERBEROS /^([^@\/]*)@(.*)$/ \1@\2
> FS /(.*)/ \1@fsauth
>
Yup, just philosophizing in general. However, it might be the culprit here...
>
> Btw, whatever the differences, itâs not just DAG, since as I mention, a queue super user on the AP canât ssh to any job (on v24):
>
> [root@babybird02 ~]# condor_q -all
>
>
> -- Schedd: babybird02.cern.ch : <188.184.96.224:22845?... @ 07/16/25 14:52:28
> OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS
> bejones ID: 195 7/16 14:52 _ 1 _ 1 195.0
> bejones DAG: 196 7/16 14:52 _ 1 _ 1 197.0
>
> Total for query: 2 jobs; 0 completed, 0 removed, 0 idle, 2 running, 0 held, 0 suspended
> Total for all users: 2 jobs; 0 completed, 0 removed, 0 idle, 2 running, 0 held, 0 suspended
>
> [root@babybird02 ~]# condor_ssh_to_job 195.0
> condor is not authorized for access to the starter for job 195.0
> [root@babybird02 ~]# condor_ssh_to_job 197.0
> condor is not authorized for access to the starter for job 197.0
> [root@babybird02 ~]# condor_config_val QUEUE_SUPER_USERS
> root, condor
>
>
> cheers,
> Ben
>
>
>
>
>