Hello Carles,
We do not know of any issues with PID namespaces. However, it is possible that it is no longer working properly. We will try to reproduce the problem here.
...Tim
Hello again,
I updated the CE and the testing WNs in Alma9 to HTCondor 10.8.0 but the jobs continued to fail. So, my last option was to change to false the USE_PID_NAMESPACES option on Alma9 WNs. After that, the CE jobs started to run again.
Is there an issue introduced in HTCondor 10.6.0 with AlmaLinux 9 - CE jobs and pid namespaces?
As I commented, this issue started with HTCondor 10.6.0 version in AlmaLinux 9 WNs and apparently only for the jobs routed from a CE.
Cheers,
Carles
On Tue, 5 Sept 2023 at 08:14, Carles Acosta <cacosta@xxxxxx> wrote:
Hi,
After more testing, we have discovered that not all jobs are failing, only the ones coming from the HTCondor-CE.
According to the HTCondor release highlights, in version 10.6.0 the executable is no longer renamed to condor_exec.exe. Could the problem be related to this? I do not know.
We have run the StarterLog with D_ALL debug for one example job. We can send the log if necessary.
Thank you again.
Cheers,
Carles
On Fri, 1 Sept 2023 at 16:14, Carles Acosta <cacosta@xxxxxx> wrote:
Hello,
We have WNs in AlmaLinux 9 with HTCondor 10.5.0 that were running apparently fine. However, after updating to 10.6.0 (or 10.7.0), new jobs are not correctly executed. There are these errors in the StarterLog.slotX_X:
09/01/23 07:23:30 (pid:54345) Create_Process succeeded, pid=54393
09/01/23 07:23:30 (pid:54345) Process exited, pid=54393, status=127
09/01/23 07:23:30 (pid:54345) JobReaper: condor_pid_ns_init didn't drop filename /home/execute/dir_54345/.condor_pid_ns_status (2)
09/01/23 07:23:30 (pid:54345) ERROR "Starter configured to use PID NAMESPACES, but libexec/condor_pid_ns_init did not run properly" at line 751 in file /var/lib/condor/execute/slot1/dir_3398586/userdir/build-ytPdzf/BUILD/condor-10.7.0/src/condor_starter.V6.1/vanilla_proc.cpp
09/01/23 07:23:30 (pid:54345) ShutdownFast all jobs.
I do not see in StartLog any other hint:
106336 09/01/23 07:23:30 Starter pid 54345 exited with status 4
106337 09/01/23 07:23:30 slot1_1: State change: starter exited
106338 09/01/23 07:23:30 slot1_1: Changing activity: Busy -> Idle
Reading again the version history, I'm not sure what change generates this error. Has anyone had a similar problem?
Thank you in advance.
Best regards,
Carles--
Carles Acosta i SilvaPIC (Port d'Informacià CientÃfica)Campus UAB, Edifici DE-08193 Bellaterra, BarcelonaTel: +34 93 581 33 08Fax: +34 93 581 41 10AvÃs - Aviso - Legal Notice: http://legal.ifae.es
--
Carles Acosta i SilvaPIC (Port d'Informacià CientÃfica)Campus UAB, Edifici DE-08193 Bellaterra, BarcelonaTel: +34 93 581 33 08Fax: +34 93 581 41 10AvÃs - Aviso - Legal Notice: http://legal.ifae.es
--
Carles Acosta i SilvaPIC (Port d'Informacià CientÃfica)Campus UAB, Edifici DE-08193 Bellaterra, BarcelonaTel: +34 93 581 33 08Fax: +34 93 581 41 10AvÃs - Aviso - Legal Notice: http://legal.ifae.es
_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
-- Tim Theisen (he, him, his) Release Manager HTCondor & Open Science Grid Center for High Throughput Computing Department of Computer Sciences University of Wisconsin - Madison 4261 Computer Sciences and Statistics 1210 W Dayton St Madison, WI 53706-1685 +1 608 265 5736