[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] recomended systemd cgroup slicing set up?



Hi all,

an observation regarding Condor23 on EL9 with cgroups v2 [1]:

We have put the Condor cgroup below the system slice as

> condor_config_val BASE_CGROUP
system.slice/condor.slice

which looks OK and job slices are put underneath as expected.

However, I just noticed that systemd mentions cgroup name for the service as
  CGroup: /system.slice/condor.service
which exists in parallel and actually as parent group owns the starter and the job children [3]

I am a bit confused if it is actually a good idea/necessary to set the base group path for Condor actually wrt to the system slice? I.e., with a base condor path right at the root of the cgroup mount, all condor daemon PIDs will be arranged under the system slice by default, right? If one wants for consistency to keep the hierarchy also in the paths, probably the best way would be to go for `system.slice/condor.service` as BASE_CGROUP and put the children sub-groups also along the "right" paths according to the hierarchy of their PIDs - or might there be a drawback?

Cheers,
  Thomas


[1]
condor-23.0.4-1.el9.x86_64
condor-stash-plugin-6.12.1-1.x86_64
python3-condor-23.0.4-1.el9.x86_64

systemd-252-18.el9.x86_64
systemd-libs-252-18.el9.x86_64
systemd-pam-252-18.el9.x86_64
systemd-rpm-macros-252-18.el9.noarch
systemd-udev-252-18.el9.x86_64

Linux batch1532.desy.de 5.14.0-362.24.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Feb 15 07:18:13 EST 2024 x86_64 x86_64 x86_64 GNU/Linux

[2]
â condor.service - Condor Distributed High-Throughput-Computing
Loaded: loaded (/usr/lib/systemd/system/condor.service; enabled; preset: disabled)
    Drop-In: /etc/systemd/system/condor.service.d
             ââ01-condor-basic-overwrites.conf
     Active: active (running) since Wed 2024-04-17 15:10:54 CEST; 16min ago
Process: 4661 ExecStartPre=/usr/bin/mkdir -p /var/run/condor (code=exited, status=0/SUCCESS) Process: 4738 ExecStartPre=/usr/bin/chown -R condor:condor /var/run/condor (code=exited, status=0/SUCCESS)
   Main PID: 4820 (condor_master)
     Status: "All daemons are responding"
      Tasks: 6 (limit: 4194303)
     Memory: 160.4M
        CPU: 49.897s
     CGroup: /system.slice/condor.service
             ââ 4820 /usr/sbin/condor_master -f
ââ 7001 condor_procd -A /var/run/condor/procd_pipe -L /var/log/condor/ProcLog -R 1000000 -S 60 -C 25411
             ââ 7003 condor_shared_port -p 9620
             ââ 7004 condor_startd
ââ55715 condor_starter -f -local-name slot_type_1 -a slot1_2 grid-htc-ce03.desy.de ââ61136 condor_starter -f -local-name slot_type_1 -a slot1_5 grid-htc-ce03.desy.de

> cat /proc/55715/cgroup
0::/system.slice/condor.service

> cat /sys/fs/cgroup/system.slice/condor.slice/condor_var_lib_condor_execute_slot1_2@xxxxxxxxxxxxxxxxx/cgroup.procs
55717

[3]
> cat /sys/fs/cgroup/system.slice/condor.service/cgroup.procs
4820
7001
7003
7004
55715
139357

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature