[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_rm job, Permission denied to force removal



Ok.  I think I might see the problem.  

The Schedd thinks your identity is ajbarr@xxxxxxxxxxxxxxxxx,  but NTSSPI authenticates you as ajbarr@COMPANY

even with case-insensitive prefix matching, these two things don't match, so when you run condor_rm,  the Schedd can't find a user record for ajbar@COMPANY and so it calls you "anonymous user", because it thinks you are trying to remove jobs, but you have no jobs in the SCHEDD.

To fix this,  you need to make sure that the UID_DOMAIN configuration value in the Schedd matches the NTDOMAIN value for all users for all users that will be using NTSSPI authentication. 

Run 

condor_config_val -v UID_DOMAIN

On you schedd, I suspect it is set to the default, which is $(FULL_HOSTNAME), This is the correct default for windows machines that are not part of an NT Domain, but for machines that are part of an NT Domain,  UID_DOMAIN should be the same as the NT Domain name. 

If you need to change the UID_DOMAIN of your Schedd,   you should also set PRIOR_UID_DOMAIN
to the old value in the configuration before you restart.  That way the Schedd should fix up ownership of jobs and user records that were using the old UID_DOMAIN value when it starts up.

-tj


From: Andy Barr <ajbarr@xxxxxxxxx>
Sent: Thursday, January 23, 2025 6:53 AM
To: John M Knoeller <johnkn@xxxxxxxxxxx>
Cc: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] condor_rm job, Permission denied to force removal
 
Hi,
  condor_qusers
USER                             OWNER   NTDOMAIN   ENABLED MAX_RUN JOBS:Idle Running    Held Removed Completed
ajbarr@xxxxxxxxxxxxxxxxx         ajbarr  COMPANY     yes     default         0       0       1

Thanks,
Andy

On Wed, Jan 22, 2025 at 4:28âPM John M Knoeller <johnkn@xxxxxxxxxxx> wrote:
That seems like a reasonable guess.    

what does 

    condor_qusers 

show as your full username?  

-tj


From: Andy Barr <ajbarr@xxxxxxxxx>
Sent: Wednesday, January 22, 2025 7:25 AM
To: John M Knoeller <johnkn@xxxxxxxxxxx>
Cc: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] condor_rm job, Permission denied to force removal
 
Hi John,
Thanks for the help.  I have attached SchedLog output after adding SCHEDD_DEBUG = $(SCHEDD_DEBUG) D_SECURITY:2 and running the command, 
condor_rm 2.0

Could this be a upper / lower case issue?  

One thing I see in the log in this,
AuthenticatedName = "ajbarr@COMPANY"

But after that I see this,
User = "ajbarr@company"

Note, I sanitized the log and replaced the hostnames with general names. 

Thanks,
Andy





On Tue, Jan 21, 2025 at 5:54âPM John M Knoeller <johnkn@xxxxxxxxxxx> wrote:
01/21/25 13:15:01 HANDSHAKE: in handshake(my_methods = 'NTSSPI,PASSWORD')
01/21/25 13:15:01 HANDSHAKE: handshake() - i am the client
01/21/25 13:15:01 HANDSHAKE: sending (methods == 528) to server
01/21/25 13:15:01 HANDSHAKE: server replied (method = 16)
01/21/25 13:15:01 Authentication was a Success.
01/21/25 13:15:01 AUTHENTICATION: setting default map to (null)
01/21/25 13:15:01 AUTHENTICATION: post-map: current FQU is '(null)'

This shows that NTSSPI (method bit 16) was the authentication method used, but for some reason
the authenticated identity could not be converted to a username.    (FQU is fully qualified user. )

I think we need to see the SchedLog,  try adding this to your configuration,  then reconfig the sched and reproduce the problem. 

SCHEDD_DEBUG = $(SCHEDD_DEBUG) D_SECURITY:2

This will produce a lot of output in the SchedLog, but I think we need the detailed logging to give us some clue why NTSSPI authentication is succeeding, but the username ends up being anonymous anyway.

-tj


From: Andy Barr <ajbarr@xxxxxxxxx>
Sent: Tuesday, January 21, 2025 2:56 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: John M Knoeller <johnkn@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] condor_rm job, Permission denied to force removal
 
Hi John,
What Condor version?   
$CondorVersion: 24.2.2 2024-12-04 BuildID: 772905 GitSHA: 2b56256d $
$CondorPlatform: x86_64_Windows10 $

Can you submit new jobs to the schedd? 
Yes

Are you logged in to the machine that the SCHEDD is running on? or are you trying to remove jobs from a SCHEDD remotely?   Some authorization methods only work locally.  
Yes, but I would like to be able to remove jobs from a SCHEDD remotely eventually.

If you are running Condor version 24 or later, you can try

condor_rm 2.0 -debug:D_SECURITY
01/21/25 13:15:01 Win32 sysapi_get_network_device_info_raw()
01/21/25 13:15:01 SECMAN: command 478 ACT_ON_JOBS to <10.29.4.45:9618> from TCP port 49801 (blocking).
01/21/25 13:15:01 SECMAN: new session, doing initial authentication.
01/21/25 13:15:01 SECMAN: Auth methods: NTSSPI,PASSWORD
01/21/25 13:15:01 AUTHENTICATE: setting timeout for <10.29.4.45:9618?addrs=10.29.4.45-9618&alias=node1.company.com&noUDP&sock=schedd_15316_70f8> to 20.
01/21/25 13:15:01 HANDSHAKE: in handshake(my_methods = 'NTSSPI,PASSWORD')
01/21/25 13:15:01 HANDSHAKE: handshake() - i am the client
01/21/25 13:15:01 HANDSHAKE: sending (methods == 528) to server
01/21/25 13:15:01 HANDSHAKE: server replied (method = 16)
01/21/25 13:15:01 Authentication was a Success.
01/21/25 13:15:01 AUTHENTICATION: setting default map to (null)
01/21/25 13:15:01 AUTHENTICATION: post-map: current FQU is '(null)'
01/21/25 13:15:01 AUTHENTICATE: Exchanging keys with remote side.
01/21/25 13:15:01 AUTHENTICATE: Result of end of authenticate is 1.
01/21/25 13:15:01 SECMAN: generating AES key for session with <10.29.4.45:9618>...
01/21/25 13:15:01 SECMAN: successfully enabled encryption!
01/21/25 13:15:01 SECMAN: successfully enabled message authenticator!
01/21/25 13:15:01 SESSION: client duplicated AES to BLOWFISH key for UDP.
01/21/25 13:15:01 SECMAN: added session P01200537:17268:1737483301:11 to cache for 60 seconds (3600s lease).
01/21/25 13:15:01 SECMAN: startCommand succeeded.
01/21/25 13:15:01 DCSchedd:actOnJobs: Action failed

On Tue, Jan 21, 2025 at 12:21âPM John M Knoeller via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
What Condor version?   

Can you submit new jobs to the schedd? 

Are you logged in to the machine that the SCHEDD is running on? or are you trying to remove jobs from a SCHEDD remotely?   Some authorization methods only work locally.  

If you are running Condor version 24 or later, you can try

condor_rm 24.0 -debug:D_SECURITY 

To get more detailed logging,  but we probably need D_SECURITY logging from the SchedLog to see why it is not authenticating you. 

-tj



From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Andy Barr <ajbarr@xxxxxxxxx>
Sent: Sunday, January 19, 2025 8:02 AM
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] condor_rm job, Permission denied to force removal
 
Hi,
I'm trying to remove jobs that are in the HOLD state in my condor pool.  This is a small windows OS only pool that I am working on setting up.  I am the owner of the job

OWNER   BATCH_NAME    SUBMITTED   DONE   RUN    IDLE   HOLD  TOTAL JOB_IDS
ajbarr ID: 24      12/13 17:18      _      _      _      1      1 24.0

I'm using the command,

condor_rm -force 24.0
Permission denied to force removal of job 24.0

Last, I get this error message in my SchedLog,
01/19/25 08:57:47 (pid:27872) QMGT command failed: anonymous user not permitted

so it seems for some reason it thinks I'm an anonymous user?
from a dos prompt I get,
whoami
company\ajbarr

I am able to successfully run jobs on this pool. 

Thanks for your help,
Andy
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/