Hi,
Just curious if there was a problem found here? Iâve noticed something similar when testing release > 10.5 on el9 (tested 10.7 and 23.0). For me jobs go on hold immediately with:
"Error from slot1_1@xxxxxxxxxxxxxxxxxxx: Failed to execute '/pool/condor/dir_2355954/condor_pid_ns_init' with arguments /afs/cern.ch/user/b/bejones/tmp/condor/hello.sh hello: (errno=2: 'No such file or directoryâ)"
Looks like itâs looking for condor_pid_ns_init in the sandbox rather than in LIBEXEC. Same config on the EP works on 10.5. Are we missing some config?
cheers, Ben On 20 Sep 2023, at 16:11, Tim Theisen via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
Hello Carles, We do not know of any issues with PID namespaces. However, it is
possible that it is no longer working properly. We will try to
reproduce the problem here. ...Tim
On 9/19/23 08:09, Carles Acosta wrote:
Hello again,
I updated the CE and the testing WNs in Alma9 to HTCondor
10.8.0 but the jobs continued to fail. So, my last option was
to change to false the USE_PID_NAMESPACES option on Alma9 WNs.
After that, the CE jobs started to run again.
Is there an issue introduced in HTCondor 10.6.0 with
AlmaLinux 9 - CE jobs and pid namespaces?
As I commented, this issue started with HTCondor 10.6.0
version in AlmaLinux 9 WNs and apparently only for the jobs
routed from a CE.
Cheers,
Carles
Hi,
After more testing, we have discovered that not all
jobs are failing, only the ones coming from the
HTCondor-CE.
According to the HTCondor release highlights, in
version 10.6.0 the executable is no longer renamed to
condor_exec.exe. Could the problem be related to this? I
do not know.
We have run the StarterLog with D_ALL debug for one
example job. We can send the log if necessary.
Thank you again.
Cheers,
Carles
Hello,
We have WNs in AlmaLinux 9 with HTCondor 10.5.0
that were running apparently fine. However, after
updating to 10.6.0 (or 10.7.0), new jobs are not
correctly executed. There are these errors in the
StarterLog.slotX_X:
09/01/23 07:23:30
(pid:54345) Create_Process succeeded, pid=54393
09/01/23 07:23:30 (pid:54345) Process exited,
pid=54393, status=127
09/01/23 07:23:30 (pid:54345) JobReaper:
condor_pid_ns_init didn't drop filename
/home/execute/dir_54345/.condor_pid_ns_status (2)
09/01/23 07:23:30 (pid:54345) ERROR "Starter
configured to use PID NAMESPACES, but
libexec/condor_pid_ns_init did not run properly"
at line 751 in file
/var/lib/condor/execute/slot1/dir_3398586/userdir/build-ytPdzf/BUILD/condor-10.7.0/src/condor_starter.V6.1/vanilla_proc.cpp
09/01/23 07:23:30
(pid:54345) ShutdownFast all jobs.
I do not see in StartLog any other hint:
106336 09/01/23 07:23:30
Starter pid 54345 exited with status 4
106337 09/01/23 07:23:30 slot1_1: State change:
starter exited
106338 09/01/23 07:23:30 slot1_1: Changing
activity: Busy -> Idle
Reading again the version history, I'm not sure
what change generates this error. Has anyone had a
similar problem?
Thank you in advance.
Best regards,
Carles
--
Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10
--
Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10
--
Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
--
Tim Theisen (he, him, his)
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736
_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
|