[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] v24.0.4 condor_submit only works sometimes



Hello, Maarten.


The core of the problem seems to be that FS authentication is not working properly and the user is authenticated as “condor@xxxxxxx”. 



Could you please check the condor_ping information as user alicesgm?

----

condor_ping -debug -verbose -type schedd WRITE


....

Authenticated using:         FS

All authentication methods: TOKEN,FS

....

-------


First, check the mount option and permissions information sharing in the /tmp directory, it may be that the alicesgm account is unable to write to /tmp or SELinux issue.


If you suspect SELinux, check the information below to see if you missed anything.


[root@ui20 tmp]# semanage permissive -l


Builtin Permissive Types 


condor_negotiator_t

condor_master_t

condor_collector_t

condor_procd_t

condor_startd_t

condor_schedd_t


As I know, absence of condor_schedd_t can cause SELinux to fail because actions not registered with permissive can be blocked.



Also, could you check that the account have an idtokens issued as "condor@xxxxxxx"?


Similarly, you can check by doing a condor_token_list on the alicesgm account. 


Regards,


-- Geonmo

ââââââ ìë ëì ââââââ

ëëìë : Maarten Litmaath via HTCondor-users <htcondor-users@xxxxxxxxxxx>

ëëìë : HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>

ìì : Maarten Litmaath <Maarten.Litmaath@xxxxxxx>

ëìëì : 2025-02-11 (í) 09:13:56

ìë : Re: [HTCondor-users] v24.0.4 condor_submit only works sometimes




Hi Cole & Geonmo,
here are my answers:
[alicesgm@htc24s-ce ~]$ condor_submit -debug D_SECURITY my-test.jdl 
Submitting job(s)02/11/25 00:41:50 Can't open directory "/etc/condor/passwords.d" as PRIV_ROOT, errno: 13 (Permission denied)
02/11/25 00:41:50 Can't open directory "/etc/condor/passwords.d" as PRIV_ROOT, errno: 13 (Permission denied)
.
ERROR: Failed to commit job submission into the queue.
ERROR: Failed to create new User record for condor@xxxxxxxx
[alicesgm@htc24s-ce ~]$ 

[alicesgm@htc24s-ce ~]$ condor_config_val -verbose ALLOW_WRITE SCHEDD_NAME \
QUEUE_SUPER_USERS SEC_DEFAULT_AUTHENTICATION_METHODS \
SEC_DAEMON_AUTHENTICATION_METHODS SEC_CLIENT_AUTHENTICATION_METHODS \
ALLOW_DAEMON TRUST_DOMAIN

ALLOW_WRITE = *
 # at: /etc/condor/config.d/01-submit.config, line 3, use SECURITY:recommended_v24_0+12
 # raw: ALLOW_WRITE = *

Not defined: SCHEDD_NAME
 # at: <Default>
 # raw: SCHEDD_NAME = 

QUEUE_SUPER_USERS = root, condor
 # at: <Default>
 # raw: QUEUE_SUPER_USERS = root, condor

SEC_DEFAULT_AUTHENTICATION_METHODS = FS,IDTOKENS,KERBEROS,SCITOKENS,SSL
 # at: <Default>
 # raw: SEC_DEFAULT_AUTHENTICATION_METHODS = FS,IDTOKENS,KERBEROS,SCITOKENS,SSL

Not defined: SEC_DAEMON_AUTHENTICATION_METHODS

SEC_CLIENT_AUTHENTICATION_METHODS = FS,IDTOKENS,KERBEROS,SCITOKENS,SSL,ANONYMOUS
 # at: /etc/condor/config.d/01-submit.config, line 3, use SECURITY:get_htcondor_idtokens+9
 # raw: SEC_CLIENT_AUTHENTICATION_METHODS = $(SEC_DEFAULT_AUTHENTICATION_METHODS),ANONYMOUS

ALLOW_DAEMON = condor@*  condor@password
 # at: /etc/condor/config.d/01-submit.config, line 3, use SECURITY:recommended_v24_0+10
 # raw: ALLOW_DAEMON = condor@*  condor@password

TRUST_DOMAIN = htc24s-cm.cern.ch
 # at: /etc/condor/config.d/01-submit.config, line 3, use SECURITY:get_htcondor_idtokens+20
 # raw: TRUST_DOMAIN = $(CONDOR_HOST)

[alicesgm@htc24s-ce ~]$ 

[root@htc24s-ce ~]# condor_token_list && condor_token_list -dir /etc/condor-ce/tokens.d
Header: {"alg":"HS256","kid":"POOL"} Payload: {"iat":1739038309,"iss":"htc24s-cm.cern.ch","jti":"b5175124e4a8e4c41d4141e25e0b0633","sub":"condor@xxxxxxxxxxxxxxxxx"} File: /etc/condor/tokens.d/condor@xxxxxxxxxxxxxxxxx
Header: {"alg":"HS256","kid":"POOL"} Payload: {"iat":1739038309,"iss":"htc24s-cm.cern.ch","jti":"b5175124e4a8e4c41d4141e25e0b0633","sub":"condor@xxxxxxxxxxxxxxxxx"} File: /etc/condor-ce/tokens.d/condor@xxxxxxxxxxxxxxxxx
[root@htc24s-ce ~]# 



From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Cole Bollig via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: Monday, February 10, 2025 4:40 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: Cole Bollig <cabollig@xxxxxxxx>
Subject: Re: [HTCondor-users] v24.0.4 condor_submit only works sometimes
 
Hi Maarten,

In addition to the information Geonmo mentioned to check, is the configuration value SCHEDD_HOST defined in the configuration (condor_config_val -v SCHEDD_HOST) and when the job submission is success who is the owner in the job(s) ClassAd?

Another thing that might be helpful/interesting is comparing the output of a successful and failed job submission when doing condor_submit -debug D_SECURITY <submit file>.

-Cole Bollig

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of "ëêë" <geonmo@xxxxxxxxxxx>
Sent: Sunday, February 9, 2025 8:00 PM
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] v24.0.4 condor_submit only works sometimes
 

Hello, Maarten.


Could you share some variables using condor_config_val?


In addition, idtoken information? 

[On root shell, condor_token_list && condor_token_list -dir /etc/condor-ce/tokens.d]


The error we experienced is a little different from the message you showed, but the user information of the IDTOKENS used by the HTCondor-CE Daemon was not in the ALLOW_WRITE list of HTCondor, so it was rejected. 


I solved it by simply overwriting the IDTOKENS in /etc/condor/tokens.d/ with /etc/condor-ce/tokens.d/, but I don't know if it's the right solution.


However, it seems like this is an issue when submitting jobs to HTCondor via HTCondor-CE and not why HTCondor itself is not submitting.


Regards,


-- Geonmo




ââââââ ìë ëì ââââââ

ëëìë : Maarten Litmaath via HTCondor-users <htcondor-users@xxxxxxxxxxx>

ëëìë : "htcondor-users@xxxxxxxxxxx" <htcondor-users@xxxxxxxxxxx>

ìì : Maarten Litmaath <Maarten.Litmaath@xxxxxxx>

ëìëì : 2025-02-09 (ì) 22:19:58

ìë : Re: [HTCondor-users] v24.0.4 condor_submit only works sometimes




Hi again,
with an HTCondor CE installed in addition on the Submit Node,
jobs are accepted by the CE, but refused by the latter's Schedd:

02/09/25 14:07:15 (pid:10651) SetEffectiveOwner security violation: 
attempting to set owner to dis-allowed value alicesgm@xxxxxxxxxxxxxxxxx

Further advice would be appreciated, thanks!



From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Maarten Litmaath via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: Sunday, February 9, 2025 1:35 PM
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Cc: Maarten Litmaath <Maarten.Litmaath@xxxxxxx>
Subject: Re: [HTCondor-users] v24.0.4 condor_submit only works sometimes
 
Hi again,
using the current version in the Feature Channel, v24.4.0, all works fine,
while the LTS Channel has the problem described below.

We do not want to advise our sites to switch to the Feature Channel,
because we usually prefer the stability of the LTS Channel...

The two Channels appear to have some unwanted difference,
for which I did not yet find a clue in the Feature Channel release notes...



From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Maarten Litmaath via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: Sunday, February 9, 2025 12:30 PM
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Cc: Maarten Litmaath <Maarten.Litmaath@xxxxxxx>
Subject: [HTCondor-users] v24.0.4 condor_submit only works sometimes
 
Dear HTCondor experts,
I have set up a v24.0.4 mini cluster on Alma 9 using the Admin Quick Start Guide:

https://htcondor.readthedocs.io/en/latest/getting-htcondor/admin-quick-start.html

As an unprivileged user on the Submit Node, condor_submit fails as shown:

======================================================================
[alicesgm@htc24s-ce ~]$ cat my-test.jdl 
cmd = my-test.sh
output = my-test.out.$(ClusterId)
error  = my-test.err.$(ClusterId)
log = my-test.log.$(ClusterId)
+MaxMemory = 50
queue 1
[alicesgm@htc24s-ce ~]$ condor_submit my-test.jdl 
Submitting job(s).
ERROR: Failed to commit job submission into the queue.
ERROR: Failed to create new User record for condor@xxxxxxxx
[alicesgm@htc24s-ce ~]$ 
======================================================================

If I keep trying, though, eventually it works:

======================================================================
[alicesgm@htc24s-ce ~]$ for i in `seq 30`; do condor_submit my-test.jdl &&
 break; sleep 61; done &>> log-$$.txt < /dev/null &
[1] 33484
[alicesgm@htc24s-ce ~]$ tail -f log-$$.txt
Submitting job(s).
ERROR: Failed to commit job submission into the queue.
ERROR: Failed to create new User record for condor@xxxxxxxx
Submitting job(s).
ERROR: Failed to commit job submission into the queue.
ERROR: Failed to create new User record for condor@xxxxxxxx
Submitting job(s).
ERROR: Failed to commit job submission into the queue.
ERROR: Failed to create new User record for condor@xxxxxxxx
Submitting job(s).
ERROR: Failed to commit job submission into the queue.
ERROR: Failed to create new User record for condor@xxxxxxxx
Submitting job(s).
1 job(s) submitted to cluster 19.
======================================================================

That job then runs fine, while the next job submission will fail again, etc.

There appear to be two problems here:

1) The Admin Quick Start Guide gives me a cluster that does not work.

2) Due to some bug, job submissions sometimes get through nonetheless.

Advice would be appreciated, thanks!




PNG image

PNG image