[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] v24.0.4 condor_submit only works sometimes



Hi again,
with an HTCondor CE installed in addition on the Submit Node,
jobs are accepted by the CE, but refused by the latter's Schedd:

02/09/25 14:07:15 (pid:10651) SetEffectiveOwner security violation: 
attempting to set owner to dis-allowed value alicesgm@xxxxxxxxxxxxxxxxx

Further advice would be appreciated, thanks!



From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Maarten Litmaath via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: Sunday, February 9, 2025 1:35 PM
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Cc: Maarten Litmaath <Maarten.Litmaath@xxxxxxx>
Subject: Re: [HTCondor-users] v24.0.4 condor_submit only works sometimes
 
Hi again,
using the current version in the Feature Channel, v24.4.0, all works fine,
while the LTS Channel has the problem described below.

We do not want to advise our sites to switch to the Feature Channel,
because we usually prefer the stability of the LTS Channel...

The two Channels appear to have some unwanted difference,
for which I did not yet find a clue in the Feature Channel release notes...



From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Maarten Litmaath via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: Sunday, February 9, 2025 12:30 PM
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Cc: Maarten Litmaath <Maarten.Litmaath@xxxxxxx>
Subject: [HTCondor-users] v24.0.4 condor_submit only works sometimes
 
Dear HTCondor experts,
I have set up a v24.0.4 mini cluster on Alma 9 using the Admin Quick Start Guide:

https://htcondor.readthedocs.io/en/latest/getting-htcondor/admin-quick-start.html

As an unprivileged user on the Submit Node, condor_submit fails as shown:

======================================================================
[alicesgm@htc24s-ce ~]$ cat my-test.jdl 
cmd = my-test.sh
output = my-test.out.$(ClusterId)
error  = my-test.err.$(ClusterId)
log = my-test.log.$(ClusterId)
+MaxMemory = 50
queue 1
[alicesgm@htc24s-ce ~]$ condor_submit my-test.jdl 
Submitting job(s).
ERROR: Failed to commit job submission into the queue.
ERROR: Failed to create new User record for condor@xxxxxxxx
[alicesgm@htc24s-ce ~]$ 
======================================================================

If I keep trying, though, eventually it works:

======================================================================
[alicesgm@htc24s-ce ~]$ for i in `seq 30`; do condor_submit my-test.jdl &&
 break; sleep 61; done &>> log-$$.txt < /dev/null &
[1] 33484
[alicesgm@htc24s-ce ~]$ tail -f log-$$.txt
Submitting job(s).
ERROR: Failed to commit job submission into the queue.
ERROR: Failed to create new User record for condor@xxxxxxxx
Submitting job(s).
ERROR: Failed to commit job submission into the queue.
ERROR: Failed to create new User record for condor@xxxxxxxx
Submitting job(s).
ERROR: Failed to commit job submission into the queue.
ERROR: Failed to create new User record for condor@xxxxxxxx
Submitting job(s).
ERROR: Failed to commit job submission into the queue.
ERROR: Failed to create new User record for condor@xxxxxxxx
Submitting job(s).
1 job(s) submitted to cluster 19.
======================================================================

That job then runs fine, while the next job submission will fail again, etc.

There appear to be two problems here:

1) The Admin Quick Start Guide gives me a cluster that does not work.

2) Due to some bug, job submissions sometimes get through nonetheless.

Advice would be appreciated, thanks!