Dear HTCondor experts,
I have set up a v24.0.4 mini cluster on Alma 9 using the Admin Quick Start Guide:
As an unprivileged user on the Submit Node, condor_submit fails as shown:
======================================================================
[alicesgm@htc24s-ce ~]$ cat my-test.jdl
cmd = my-test.sh
output = my-test.out.$(ClusterId)
error = my-test.err.$(ClusterId)
log = my-test.log.$(ClusterId)
+MaxMemory = 50
queue 1
[alicesgm@htc24s-ce ~]$ condor_submit my-test.jdl
Submitting job(s).
ERROR: Failed to commit job submission into the queue.
ERROR: Failed to create new User record for condor@xxxxxxxx
[alicesgm@htc24s-ce ~]$
======================================================================
If I keep trying, though, eventually it works:
======================================================================
[alicesgm@htc24s-ce ~]$ for i in `seq 30`; do condor_submit my-test.jdl &&
break; sleep 61; done &>> log-$$.txt < /dev/null &
[1] 33484
[alicesgm@htc24s-ce ~]$ tail -f log-$$.txt
Submitting job(s).
ERROR: Failed to commit job submission into the queue.
ERROR: Failed to create new User record for condor@xxxxxxxx
Submitting job(s).
ERROR: Failed to commit job submission into the queue.
ERROR: Failed to create new User record for condor@xxxxxxxx
Submitting job(s).
ERROR: Failed to commit job submission into the queue.
ERROR: Failed to create new User record for condor@xxxxxxxx
Submitting job(s).
ERROR: Failed to commit job submission into the queue.
ERROR: Failed to create new User record for condor@xxxxxxxx
Submitting job(s).
1 job(s) submitted to cluster 19.
======================================================================
That job then runs fine, while the next job submission will fail again, etc.
There appear to be two problems here:
1) The Admin Quick Start Guide gives me a cluster that does not work.
2) Due to some bug, job submissions sometimes get through nonetheless.
Advice would be appreciated, thanks!
|