[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Cgroups & pid namespaces



On 8/2/24 18:15, Todd L Miller via HTCondor-users wrote:
with enabled USE_PID_NAMESPACES jobs did not start and failed with

[snip]

Is this expected behavior for

    Not as far as I know.  (For the future, the version of the OS is frequently more important than the version of the kernel.)  I guess the first thing I'd check is to make sure that htcondor "slice" exists.

I thought that "[snip]" of strace output shows, that htcondor slice exists. Doesn't following line prove that right "directory" exists
215077 mkdir("/sys/fs/cgroup/system.slice/htcondor/condor_scratch_condor_slot1_1@xxxxxxxxxxxxxxxxxxxxxxxx", 0755) = 0
and next line that HTCondor process is able to write here
215077 openat(AT_FDCWD</scratch/condor/dir_213192>, "/sys/fs/cgroup/system.slice/htcondor/condor_scratch_condor_slot1_1@xxxxxxxxxxxxxxxxxxxxxxxx/cgroup.procs", O_WRONLY) = 9</sys/fs/cgroup/system.slice/htcondor/condor_scratch_condor_slot1_1@xxxxxxxxxxxxxxxxxxxxxxxx/cgroup.procs>
while next log entry is really confusing for me
215077 write(9</sys/fs/cgroup/system.slice/htcondor/condor_scratch_condor_slot1_1@xxxxxxxxxxxxxxxxxxxxxxxx/cgroup.procs>, "215077", 6) = -1 ESRCH (No such process)

What is your minimal HTCondor 23 configuration for EL9 with USE_PID_NAMESPACE=true that works?


btw: RedHat (clones) by default doesn't provide lsb_release, but on the other hand RHEL clone OS version is part of kernel version 5.14.0-427.24.1.el9_4.x86_64 (but it is also part of condor-23.8.1-1.el9.x86_64 package version). Anyway, to be concrete this system use Alma9

Petr