[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Unable to edit jobs as QUEUE_SUPER_USER



Hey all,

 

I administer a htcondor pool running on Windows 10/11 machines. Recently I moved the central manager to another machine in the pool.

On the new CM we now run into a problem where accounts specified in the configuration as QUEUE_SUPER_USERs cannot edit/remove jobs in the queue. When I (a QUEUE_SUPER_USER) try to remove any job I get the following: (I stored my credentials using condor_store_cred beforehand)

 

 

C:\condor>condor_rm 208.0 -debug

10/10/25 16:30:42 Win32 sysapi_get_network_device_info_raw()

10/10/25 16:30:42 DCSchedd:actOnJobs: Action failed

 

Permission denied to remove job 208.0

 

 

From the full ScheddLogs [1] (D_CAT D_FULLDEBUG D_SECURITY:2 turned on) I couldn’t diagnose the problem. The Schedd seems to correctly authenticate my user (or does it??) but the QMGT command fails because "anonymous user not permitted".

Even for querying the Schedd, it does not realize that I am a QUEUE_SUPER_USER. When I do condor_q I get a list of only my jobs, whereas before moving the CM I would get a summary of all jobs, like with condor_q -all.

 

 

C:\condor>condor_q

 

-- Schedd: CM_HOSTNAME.DOMAIN : <CM_IPADDR:9618?... @ 10/10/25 12:01:00

OWNER BATCH_NAME      SUBMITTED   DONE   RUN    IDLE   HOLD  TOTAL JOB_IDS

 

Total for query: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended

Total for MYUSER: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended

Total for all users: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended

 

 

For debugging I switched to a single machine cluster running HTCondor 25.0.1 that has the same configuration as our main cluster [2], with a local Startd running. The problem is the same there.

 

Hopefully it is just our configuration that is broken, I'd appreciate any help diagnosing the problem.

 

Thanks in advance and best regards,

Frederik

 

 

 

[1]

10/10/25 16:30:42 (pid:4460) (D_ALWAYS:2) SharedPort PipeListenerHelper got messages from Listener thread:

               16:30:42.379 SharedPortEndpoint: Pipe connected and pid 4460 sent

10/10/25 16:30:42 (pid:4460) (D_ALWAYS:2) SharedPortEndpoint: Entered DoListenerAccept Win32 path.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DaemonCommandProtocol: Not enough bytes are ready for read.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_INVALIDATE_KEY: security session <CM_IPADDR:9618?addrs=CM_IPADDR-9618&alias=CM_HOSTNAME.DOMAIN&noUDP&sock=schedd_5172_b97a>#1760094042#41 lifetime expired.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_INVALIDATE_KEY: removed key id <CM_IPADDR:9618?addrs=CM_IPADDR-9618&alias=CM_HOSTNAME.DOMAIN&noUDP&sock=schedd_5172_b97a>#1760094042#41.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_INVALIDATE_KEY: security session CM_HOSTNAME:4460:1760106397:4 lifetime expired.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_INVALIDATE_KEY: removed key id CM_HOSTNAME:4460:1760106397:4.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_INVALIDATE_KEY: security session CM_HOSTNAME:4460:1760106512:5 lifetime expired.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_INVALIDATE_KEY: removed key id CM_HOSTNAME:4460:1760106512:5.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_INVALIDATE_KEY: security session admin_<CM_IPADDR:9618?addrs=CM_IPADDR-9618&alias=CM_HOSTNAME.DOMAIN&noUDP&sock=schedd_5172_b97a>#1760094042#37 lifetime expired.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_INVALIDATE_KEY: removed key id admin_<CM_IPADDR:9618?addrs=CM_IPADDR-9618&alias=CM_HOSTNAME.DOMAIN&noUDP&sock=schedd_5172_b97a>#1760094042#37.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_AUTHENTICATE: received DC_AUTHENTICATE from <CM_IPADDR:61619>

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_AUTHENTICATE: received following ClassAd:

AuthMethods = "NTSSPI,PASSWORD,KERBEROS,SSL"

Authentication = "REQUIRED"

AuthenticationNew = "OPTIONAL"

Command = 478

ConnectSinful = "<CM_IPADDR:9618?addrs=CM_IPADDR-9618&alias=CM_HOSTNAME.DOMAIN&noUDP&sock=schedd_5172_b97a>"

CryptoMethods = "AES,BLOWFISH,3DES"

ECDHPublicKey = "BNf9w1kfquugBuM/84AO3bt3t0D/aOrCM/nDfTpkRbM0R0nVdz3id3RMKncpEcs0aAfAjMf4rWkToHF2wXa0jyI="

Enact = "NO"

Encryption = "REQUIRED"

Integrity = "REQUIRED"

NegotiatedSession = true

NewSession = "YES"

OutgoingNegotiation = "REQUIRED"

RemoteVersion = "$CondorVersion: 25.0.1 2025-09-28 BuildID: 836952 GitSHA: 8e5a515a $"

ServerPid = 2656

SessionDuration = "60"

SessionLease = 3600

Subsystem = "TOOL"

TrustDomain = "DOMAIN"

10/10/25 16:30:42 (pid:4460) (D_SECURITY:2) Filtering authentication methods (NTSSPI,IDTOKENS,PASSWORD,KERBEROS,SSL) prior to offering them remotely.

10/10/25 16:30:42 (pid:4460) (D_SECURITY:2) Can try token auth because we have at least one named credential.

10/10/25 16:30:42 (pid:4460) (D_SECURITY:2) Will try IDTOKENS auth.

10/10/25 16:30:42 (pid:4460) (D_SECURITY:2) Not trying SSL auth; server is not ready.

10/10/25 16:30:42 (pid:4460) (D_SECURITY:2) Inserting pre-auth metadata for TOKEN.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_AUTHENTICATE: our_policy:

AuthMethods = "NTSSPI,TOKEN,PASSWORD,KERBEROS"

Authentication = "REQUIRED"

AuthenticationNew = "REQUIRED"

CryptoMethods = "AES,BLOWFISH,3DES"

Enact = "NO"

Encryption = "REQUIRED"

Integrity = "REQUIRED"

IssuerKeys = "LOCAL, POOL"

OutgoingNegotiation = "REQUIRED"

ParentUniqueID = "CM_HOSTNAME:5172:1760094037"

ServerPid = 4460

SessionDuration = "86400"

SessionLease = 3600

Subsystem = "SCHEDD"

TrustDomain = "DOMAIN"

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_AUTHENTICATE: the_policy:

AuthMethods = "NTSSPI"

AuthMethodsList = "NTSSPI,PASSWORD,KERBEROS"

Authentication = "YES"

CryptoMethods = "AES,BLOWFISH,3DES"

CryptoMethodsList = "AES,BLOWFISH,3DES"

Enact = "YES"

Encryption = "YES"

Integrity = "YES"

IssuerKeys = "LOCAL, POOL"

SessionDuration = "60"

SessionLease = 3600

TrustDomain = "DOMAIN"

10/10/25 16:30:42 (pid:4460) (D_SECURITY) SECMAN: Sending following response ClassAd:

AuthMethods = "NTSSPI"

AuthMethodsList = "NTSSPI,PASSWORD,KERBEROS"

Authentication = "YES"

CryptoMethods = "AES"

CryptoMethodsList = "AES,BLOWFISH,3DES"

ECDHPublicKey = "BGGKx4dYVDkxaI6F8YlwaDMfQANLi9JAMztLFLss26UgwvVnivgQSqsaEJx5jnGNzbBBnuaZEDMGTUd3wlgbjVw="

Enact = "YES"

Encryption = "YES"

Integrity = "YES"

IssuerKeys = "LOCAL, POOL"

NegotiatedSession = true

RemoteVersion = "$CondorVersion: 25.0.1 2025-09-28 BuildID: 836952 GitSHA: 8e5a515a $"

SessionDuration = "60"

SessionLease = 3600

TrustDomain = "DOMAIN"

10/10/25 16:30:42 (pid:4460) (D_SECURITY) SECMAN: new session, doing initial authentication.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) Returning to DC while we wait for socket to authenticate.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_AUTHENTICATE: authenticating RIGHT NOW.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) AUTHENTICATE: setting timeout for (unknown) to 20.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) AUTHENTICATE: in authenticate( addr == '(unknown)', methods == 'NTSSPI,PASSWORD,KERBEROS')

10/10/25 16:30:42 (pid:4460) (D_SECURITY) AUTHENTICATE: can still try these methods: NTSSPI,PASSWORD,KERBEROS

10/10/25 16:30:42 (pid:4460) (D_SECURITY) HANDSHAKE: in handshake(my_methods = 'NTSSPI,PASSWORD,KERBEROS')

10/10/25 16:30:42 (pid:4460) (D_SECURITY) HANDSHAKE: handshake() - i am the server

10/10/25 16:30:42 (pid:4460) (D_SECURITY) HANDSHAKE: client sent (methods == 592)

10/10/25 16:30:42 (pid:4460) (D_SECURITY) HANDSHAKE: i picked (method == 16)

10/10/25 16:30:42 (pid:4460) (D_SECURITY) HANDSHAKE: client received (method == 16)

10/10/25 16:30:42 (pid:4460) (D_SECURITY) AUTHENTICATE: will try to use 16 (NTSSPI)

10/10/25 16:30:42 (pid:4460) (D_SECURITY) AUTHENTICATE: do_authenticate is 1.

10/10/25 16:30:42 (pid:4460) (D_ALWAYS:2) sspi_server_auth() entered

10/10/25 16:30:42 (pid:4460) (D_ALWAYS:2) sspi_server_auth() looping

10/10/25 16:30:42 (pid:4460) (D_ALWAYS:2) sspi_server_auth(): user name is: "MYUSER"

10/10/25 16:30:42 (pid:4460) (D_ALWAYS:2) sspi_server_auth(): domain name is: "USER_DOMAIN"

10/10/25 16:30:42 (pid:4460) (D_ALWAYS:2) sspi_server_auth() exiting

10/10/25 16:30:42 (pid:4460) (D_SECURITY) AUTHENTICATE: auth_status == 16 (NTSSPI)

10/10/25 16:30:42 (pid:4460) (D_SECURITY) Authentication was a Success.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) AUTHENTICATION: setting MYUSERault map to MYUSER@USER_DOMAIN

10/10/25 16:30:42 (pid:4460) (D_SECURITY:2) AUTHENTICATION: post-map: current user is 'MYUSER'

10/10/25 16:30:42 (pid:4460) (D_SECURITY:2) AUTHENTICATION: post-map: current domain is 'USER_DOMAIN'

10/10/25 16:30:42 (pid:4460) (D_SECURITY) AUTHENTICATION: post-map: current FQU is 'MYUSER@USER_DOMAIN'

10/10/25 16:30:42 (pid:4460) (D_SECURITY) AUTHENTICATE: Exchanging keys with remote side.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) AUTHENTICATE: Result of end of authenticate is 1.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_AUTHENTICATE: authentication of CM_IPADDR complete.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_AUTHENTICATE: generating AES key for session CM_HOSTNAME:4460:1760106642:6...

10/10/25 16:30:42 (pid:4460) (D_SECURITY:2) CRYPTO: New crypto state with protocol AES

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_AUTHENTICATE: encryption enabled for session CM_HOSTNAME:4460:1760106642:6

10/10/25 16:30:42 (pid:4460) (D_SECURITY:2) SECMAN: because protocal is AES, not using other MAC.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_AUTHENTICATE: message authenticator enabled with key id CM_HOSTNAME:4460:1760106642:6.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_AUTHENTICATE: Success.

10/10/25 16:30:42 (pid:4460) (D_SECURITY:2) IPVERIFY: checking CM_HOSTNAME.DOMAIN against CM_IPADDR addrs are:

               CM_IPADDR

               CM_IPADDRv6

10/10/25 16:30:42 (pid:4460) (D_SECURITY) IPVERIFY: for CM_HOSTNAME.DOMAIN matched CM_IPADDR to CM_IPADDR

10/10/25 16:30:42 (pid:4460) (D_SECURITY:2) IPVERIFY: matched user MYUSER@USER_DOMAIN from *.DOMAIN to allow list

10/10/25 16:30:42 (pid:4460) (D_SECURITY:2) Adding to resolved authorization table: MYUSER@USER_DOMAIN/CM_IPADDR: WRITE

10/10/25 16:30:42 (pid:4460) (D_ALWAYS) PERMISSION GRANTED to MYUSER@USER_DOMAIN from host CM_IPADDR for command 478 (ACT_ON_JOBS), access level WRITE: reason: WRITE authorization policy allows hostname CM_HOSTNAME.DOMAIN; identifiers used for this remote host: CM_IPADDR,CM_HOSTNAME.DOMAIN

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_AUTHENTICATE: sending session ad:

ReturnCode = "AUTHORIZED"

Sid = "CM_HOSTNAME:4460:1760106642:6"

TriedAuthentication = true

User = "MYUSER@USER_DOMAIN"

ValidCommands = "60021,60052,421,478,480,486,488,489,487,499,531,464,479,541,542,1112,509,511,526,527,528,521,507,60007,457,60020,550,443,441,6,12,5,515,516,519,540,1111"

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_AUTHENTICATE: sent session CM_HOSTNAME:4460:1760106642:6 info!

10/10/25 16:30:42 (pid:4460) (D_SECURITY:2) SESSION: fallback crypto method would be BLOWFISH.

10/10/25 16:30:42 (pid:4460) (D_SECURITY:2) SESSION: server checking key type: 3

10/10/25 16:30:42 (pid:4460) (D_SECURITY:2) SESSION: found list: AES,BLOWFISH,3DES.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) SESSION: server duplicated AES to BLOWFISH key for UDP.

10/10/25 16:30:42 (pid:4460) (D_SECURITY) DC_AUTHENTICATE: added incoming session id CM_HOSTNAME:4460:1760106642:6 to cache for 80 seconds (lease is 3620s, return address is ).

AuthMethods = "NTSSPI"

AuthMethodsList = "NTSSPI,PASSWORD,KERBEROS"

AuthenticatedName = "MYUSER@USER_DOMAIN"

Authentication = "YES"

CryptoMethods = "AES"

CryptoMethodsList = "AES,BLOWFISH,3DES"

Enact = "YES"

Encryption = "YES"

Integrity = "YES"

IssuerKeys = "LOCAL, POOL"

NegotiatedSession = true

RemoteVersion = "$CondorVersion: 25.0.1 2025-09-28 BuildID: 836952 GitSHA: 8e5a515a $"

ServerPid = 2656

SessionDuration = "60"

SessionLease = 3600

Sid = "CM_HOSTNAME:4460:1760106642:6"

Subsystem = "TOOL"

TriedAuthentication = true

TrustDomain = "DOMAIN"

User = "MYUSER@USER_DOMAIN"

ValidCommands = "60021,60052,421,478,480,486,488,489,487,499,531,464,479,541,542,1112,509,511,526,527,528,521,507,60007,457,60020,550,443,441,6,12,5,515,516,519,540,1111"

10/10/25 16:30:42 (pid:4460) (D_ALWAYS) QMGT command failed: anonymous user not permitted

10/10/25 16:30:42 (pid:4460) (D_ALWAYS:2) actOnJobs: didn't do any work, aborting

 

 

 

[2]

--- condor_config ---

use SECURITY : recommended(SYSTEM, Administrator@*, MYUSER@*)

 

INSTALL_USER = MYUSER

CONDOR_HOST = CM_HOSTNAME.DOMAIN

UID_DOMAIN = DOMAIN

JAVA = C:\Java\1708~1.1-1\bin\java.exe

use POLICY : ALWAYS_RUN_JOBS

 

LOCAL_CONFIG_FILE             = $(LOCAL_DIR)/condor_config.local

LOCAL_CONFIG_DIR              = $(LOCAL_DIR)/config

 

UID_DOMAIN = DOMAIN

TRUST_UID_DOMAIN = True

FILESYSTEM_DOMAIN = $(FULL_HOSTNAME)

 

COLLECTOR_NAME = MyCollector

COLLECTOR_HOST  = $(CONDOR_HOST)

CREDD_HOST = $(CONDOR_HOST)

 

CREDD_CACHE_LOCALLY = True

 

ALLOW_ADMINISTRATOR = MYUSER@USER_DOMAIN/*, OTHERUSER@USER_DOMAIN/*, ANOTHERUSER@USER_DOMAIN/*

ALLOW_OWNER = $(ALLOW_ADMINISTRATOR), $(FULL_HOSTNAME)

ALLOW_CONFIG = $(ALLOW_ADMINISTRATOR)

 

QUEUE_SUPER_USERS = Administrator, MYUSER@USER_DOMAIN, MYUSER, OTHERUSER@USER_DOMAIN, ANOTHERUSER@USER_DOMAIN, FURTHERUSER@USER_DOMAIN

QUEUE_SUPER_USER_MAY_IMPERSONATE = svc_e20_htcondor

 

ALLOW_READ = *.DOMAIN

ALLOW_WRITE = *.DOMAIN

 

ALLOW_NEGOTIATOR = $(CONDOR_HOST)

 

ALLOW_DAEMON = */*.DOMAIN

ALLOW_ADVERTISE_MASTER = */*.DOMAIN

ALLOW_ADVERTISE_STARTD = */*.DOMAIN

 

# To execute just one task at a time:

NUM_SLOTS = 1

NUM_SLOTS_TYPE_1 = 1

SLOT_TYPE_1 = 100%

SLOT_TYPE_1_PARTITIONABLE = FALSE

 

SETUP_HOOK_PREPARE_JOB = C:\condor\hooks\setup.bat

 

SCHEDD_DEBUG = $(SCHEDD_DEBUG) D_CAT D_FULLDEBUG D_SECURITY:2

 

 

--- condor_config.local ---

use ROLE : CentralManager

use ROLE : Submit

use ROLE : Execute

 

DAEMON_LIST = $(DAEMON_LIST) CREDD