[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_rm job, Permission denied to force removal



I found in the CreddLog on node1, I have these errors,
getStoredPassword(): Could not locate credential for user 'condor_pool@COMPANY'
01/31/25 09:05:45 Return from Handler <DaemonCommandProtocol::WaitForSocketData> 0.003054s
01/31/25 09:05:45 Calling Handler <DaemonCommandProtocol::WaitForSocketData> (2)
01/31/25 09:05:45 DC_AUTHENTICATE: required authentication of 10.29.4.91 failed: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using PASSWORD
01/31/25 09:05:45 Return from Handler <DaemonCommandProtocol::WaitForSocketData> 0.000477s

In StartLog on node2 I have these errors,
01/31/25 08:53:18 SECMAN: required authentication with credd node1.company.com failed, so aborting command CREDD_NOP.
01/31/25 08:53:18 ERROR: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using PASSWORD
01/31/25 09:03:18 SECMAN: required authentication with credd p01200537.schaeffler.com failed, so aborting command CREDD_NOP.
01/31/25 09:03:18 ERROR: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using PASSWORD


Is this an issue with the pool password? Do I need to a pool password if I am using the condor_credd to authenticate and have stored my users password using, condor_store_cred add ?

I do need to review a previous question and John's answer from a previous question I had about a similar issue when I was starting out.Â

Thanks for the help,
Andy


On Fri, Jan 31, 2025 at 9:09âAM Andy Barr <ajbarr@xxxxxxxxx> wrote:
Hi,
I thought everything was working nicely but now when I try to run a job from node2 it isn't finding the Target.LocalCredd.ÂÂ
For some reason. Â

Âcondor_status -af Name LocalCredd
slot1@xxxxxxxxxxxxxxxxx node1.company.com
slot1@xxxxxxxxxxxxxxxxx undefined

from node2Â
condor_config_val CREDD_HOSTÂ
node1.company.com

from node1
condor_config_val CREDD_HOSTÂ
node1.company.com

This issue started happening when after I made these changes to fix the confor_rm problem,

UID_DOMAIN = COMPANY
PRIOR_UID_DOMAIN = $(FULL_HOSTNAME)

Thanks,
Andy

On Thu, Jan 23, 2025 at 2:32âPM Andy Barr <ajbarr@xxxxxxxxx> wrote:
Hi John,
You were correct and that solved my issue. Thanks so much for the help.
In case anyone else ever has this issue,Â

What I did was added these lines to my condor config and restarted condor then I could remove the jobs.Â

UID_DOMAIN = COMPANY
PRIOR_UID_DOMAIN = $(FULL_HOSTNAME)

I added these values to all the machines in my condor pool

Thanks,
Andy

On Thu, Jan 23, 2025 at 11:11âAM John M Knoeller <johnkn@xxxxxxxxxxx> wrote:
Ok. I think I might see the problem.ÂÂ

The Schedd thinks your identity is ajbarr@xxxxxxxxxxxxxxxxx, but NTSSPI authenticates you as ajbarr@COMPANY

even with case-insensitive prefix matching, these two things don't match, so when you run condor_rm, the Schedd can't find a user record for ajbar@COMPANY and so it calls you "anonymous user", because it thinks you are trying to remove jobs, but you have no jobs in the SCHEDD.

To fix this, you need to make sure that the UID_DOMAIN configuration value in the Schedd matches the NTDOMAIN value for all users for all users that will be using NTSSPI authentication.Â

RunÂ

condor_config_val -v UID_DOMAIN

On you schedd, I suspect it is set to the default, which is $(FULL_HOSTNAME), This is the correct default for windows machines that are not part of an NT Domain, but for machines that are part of an NT Domain, UID_DOMAIN should be the same as the NT Domain name.Â

If you need to change the UID_DOMAIN of your Schedd, Âyou should also set PRIOR_UID_DOMAIN
to the old value in the configuration before you restart. That way the Schedd should fix up ownership of jobs and user records that were using the old UID_DOMAIN value when it starts up.

-tj


From: Andy Barr <ajbarr@xxxxxxxxx>
Sent: Thursday, January 23, 2025 6:53 AM
To: John M Knoeller <johnkn@xxxxxxxxxxx>
Cc: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] condor_rm job, Permission denied to force removal
Â
Hi,
 condor_qusers
USER               OWNER  NTDOMAIN  ENABLED MAX_RUN JOBS:Idle Running  ÂHeld Removed Completed
ajbarr@xxxxxxxxxxxxxxxxx ajbarr COMPANY yes   default     0    0    1

Thanks,
Andy

On Wed, Jan 22, 2025 at 4:28âPM John M Knoeller <johnkn@xxxxxxxxxxx> wrote:
That seems like a reasonable guess. ÂÂ

what doesÂ

  condor_qusersÂ

show as your full username?ÂÂ

-tj


From: Andy Barr <ajbarr@xxxxxxxxx>
Sent: Wednesday, January 22, 2025 7:25 AM
To: John M Knoeller <johnkn@xxxxxxxxxxx>
Cc: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] condor_rm job, Permission denied to force removal
Â
Hi John,
Thanks for the help. I have attached SchedLog output after adding SCHEDD_DEBUG = $(SCHEDD_DEBUG) D_SECURITY:2 and running the command,Â
condor_rm 2.0

Could this be a upper / lower case issue?ÂÂ

One thing I see in the log in this,
AuthenticatedName = "ajbarr@COMPANY"

But after that I see this,
User = "ajbarr@company"

Note, I sanitized the log and replaced the hostnames with general names.Â

Thanks,
Andy





On Tue, Jan 21, 2025 at 5:54âPM John M Knoeller <johnkn@xxxxxxxxxxx> wrote:
01/21/25 13:15:01 HANDSHAKE: in handshake(my_methods = 'NTSSPI,PASSWORD')
01/21/25 13:15:01 HANDSHAKE: handshake() - i am the client
01/21/25 13:15:01 HANDSHAKE: sending (methods == 528) to server
01/21/25 13:15:01 HANDSHAKE: server replied (method = 16)
01/21/25 13:15:01 Authentication was a Success.
01/21/25 13:15:01 AUTHENTICATION: setting default map to (null)
01/21/25 13:15:01 AUTHENTICATION: post-map: current FQU is '(null)'

This shows that NTSSPI (method bit 16) was the authentication method used, but for some reason
the authenticated identity could not be converted to a username.  (FQU is fully qualified user. )

I think we need to see the SchedLog, try adding this to your configuration, then reconfig the sched and reproduce the problem.Â

SCHEDD_DEBUG = $(SCHEDD_DEBUG) D_SECURITY:2

This will produce a lot of output in the SchedLog, but I think we need the detailed logging to give us some clue why NTSSPI authentication is succeeding, but the username ends up being anonymous anyway.

-tj


From:ÂAndy Barr <ajbarr@xxxxxxxxx>
Sent:ÂTuesday, January 21, 2025 2:56 PM
To:ÂHTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc:ÂJohn M Knoeller <johnkn@xxxxxxxxxxx>
Subject:ÂRe: [HTCondor-users] condor_rm job, Permission denied to force removal
Â
Hi John,
What Condor version? Â
$CondorVersion: 24.2.2 2024-12-04 BuildID: 772905 GitSHA: 2b56256d $
$CondorPlatform: x86_64_Windows10 $

Can you submit new jobs to the schedd?Â
Yes

Are you logged in to the machine that the SCHEDD is running on? or are you trying to remove jobs from a SCHEDD remotely? ÂSome authorization methods only work locally.ÂÂ
Yes, but I would like to be able to remove jobs from a SCHEDD remotely eventually.

If you are running Condor version 24 or later, you can try

condor_rm 2.0 -debug:D_SECURITY
01/21/25 13:15:01 Win32 sysapi_get_network_device_info_raw()
01/21/25 13:15:01 SECMAN: command 478 ACT_ON_JOBS to <10.29.4.45:9618> from TCP port 49801 (blocking).
01/21/25 13:15:01 SECMAN: new session, doing initial authentication.
01/21/25 13:15:01 SECMAN: Auth methods: NTSSPI,PASSWORD
01/21/25 13:15:01 AUTHENTICATE: setting timeout for <10.29.4.45:9618?addrs=10.29.4.45-9618&alias=node1.company.com&noUDP&sock=schedd_15316_70f8> to 20.
01/21/25 13:15:01 HANDSHAKE: in handshake(my_methods = 'NTSSPI,PASSWORD')
01/21/25 13:15:01 HANDSHAKE: handshake() - i am the client
01/21/25 13:15:01 HANDSHAKE: sending (methods == 528) to server
01/21/25 13:15:01 HANDSHAKE: server replied (method = 16)
01/21/25 13:15:01 Authentication was a Success.
01/21/25 13:15:01 AUTHENTICATION: setting default map to (null)
01/21/25 13:15:01 AUTHENTICATION: post-map: current FQU is '(null)'
01/21/25 13:15:01 AUTHENTICATE: Exchanging keys with remote side.
01/21/25 13:15:01 AUTHENTICATE: Result of end of authenticate is 1.
01/21/25 13:15:01 SECMAN: generating AES key for session with <10.29.4.45:9618>...
01/21/25 13:15:01 SECMAN: successfully enabled encryption!
01/21/25 13:15:01 SECMAN: successfully enabled message authenticator!
01/21/25 13:15:01 SESSION: client duplicated AES to BLOWFISH key for UDP.
01/21/25 13:15:01 SECMAN: added session P01200537:17268:1737483301:11 to cache for 60 seconds (3600s lease).
01/21/25 13:15:01 SECMAN: startCommand succeeded.
01/21/25 13:15:01 DCSchedd:actOnJobs: Action failed

On Tue, Jan 21, 2025 at 12:21âPM John M Knoeller via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
What Condor version? Â

Can you submit new jobs to the schedd?Â

Are you logged in to the machine that the SCHEDD is running on? or are you trying to remove jobs from a SCHEDD remotely? ÂSome authorization methods only work locally.ÂÂ

If you are running Condor version 24 or later, you can try

condor_rm 24.0 -debug:D_SECURITYÂ

To get more detailed logging, but we probably need D_SECURITY logging from the SchedLog to see why it is not authenticating you.Â

-tj



From:ÂHTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Andy Barr <ajbarr@xxxxxxxxx>
Sent:ÂSunday, January 19, 2025 8:02 AM
To: htcondor-users@xxxxxxxxxxxÂ<htcondor-users@xxxxxxxxxxx>
Subject:Â[HTCondor-users] condor_rm job, Permission denied to force removal
Â
Hi,
I'm trying to remove jobs that are in the HOLD state in my condor pool. This is a small windows OS only pool that I am working on setting up. I am the owner of the job

OWNER Â BATCH_NAME Â ÂSUBMITTED Â DONE Â RUN Â ÂIDLE Â HOLD ÂTOTAL JOB_IDS
ajbarr ID: 24 Â Â Â12/13 17:18 Â Â Â_ Â Â Â_ Â Â Â_ Â Â Â1 Â Â Â1 24.0

I'm using the command,

condor_rm -force 24.0
Permission denied to force removal of job 24.0

Last, I get this error message in my SchedLog,
01/19/25 08:57:47 (pid:27872) QMGT command failed: anonymous user not permitted

so it seems for some reason it thinks I'm an anonymous user?
from a dos prompt I get,
whoami
company\ajbarr

I am able to successfully run jobs on this pool.Â

Thanks forÂyour help,
Andy
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxxÂwith a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/