[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] cgroup job scope



Hi all,

quick question but is the scope for job cgroups intended for users to be used or primarily for condor to bookkeeping?

we are debugging with a group some of their jobs running at us with some not completely understood behaviour, where their pilots are trying to create sub slices/dirs in their jobs cgroup dirs as to place their payload PIDs in a fresh sprouted cgroup sub branch.

Our EPs are on RHEL9 with Condor on 25.0.3

The users' pilots try to access/write the scope dir in the jobs cgroup dir, e.g., /sys/fs/cgroup/{...BASE_GROUP/...}/_var_lib_condor_execute_slot1_8@xxxxxxxxxxxxxxxxxxxxxxx/_var_lib_condor_execute_slot1_8@xxxxxxxxxxxxxxxxxxxxxxx
- which fails as the path belongs to root.

Since the job's cgroup belongs to the executing user, the user can in principle create a subdir and could place their PIDs in the cgroup.procs there [1]

AFAIS the scope cgroup branch contains all a job's PIDs (e.g., _var_lib_condor_execute_slot1_8@xxxxxxxxxxxxxxxxxxxxxxx/cgroup.procs) and the base cgroup.procs is empty, so that I guess that the scope is for condor's proper bookkeeping (but not for user usage), or? Since the pilots are trying to branch in the scope dir like [2], I would guess that the jobs/pilots have probably picked up the wrong path (probably via their /proc/$$/cgroup) and would have to climb one level up for creating/delegating PIDs to create a sub branch of their own, or? ð

Cheers,
  Thomas

[1]
[atlasprd000@batch1408 _var_lib_condor_execute_slot1_8@xxxxxxxxxxxxxxxxxxxxxxx]$ pwd
/sys/fs/cgroup/system.slice/condordesy.service/condorjob.slice/_var_lib_condor_execute_slot1_8@xxxxxxxxxxxxxxxxxxxxxxx
[atlasprd000@batch1408 _var_lib_condor_execute_slot1_8@xxxxxxxxxxxxxxxxxxxxxxx]$ stat .
  File: .
  Size: 0               Blocks: 0          IO Block: 4096   directory
Device: 19h/25d Inode: 65678964    Links: 4
Access: (0755/drwxr-xr-x)  Uid: (40250/atlasprd000)   Gid: ( 4025/atlasprd)
Context: system_u:object_r:cgroup_t:s0
Access: 2025-11-24 14:12:13.700171545 +0100
Modify: 2025-11-24 14:37:22.638182579 +0100
Change: 2025-11-24 14:37:22.638182579 +0100
 Birth: -
[atlasprd000@batch1408 _var_lib_condor_execute_slot1_8@xxxxxxxxxxxxxxxxxxxxxxx]$ mkdir thartanwashere.d

[atlasprd000@batch1408 _var_lib_condor_execute_slot1_8@xxxxxxxxxxxxxxxxxxxxxxx]$ ls -all thartanwashere.d/
total 0
drwxr-xr-x. 2 atlasprd000 atlasprd 0 Nov 24 14:37 .
drwxr-xr-x. 4 atlasprd000 atlasprd 0 Nov 24 14:37 ..
-r--r--r--. 1 atlasprd000 atlasprd 0 Nov 24 14:37 cgroup.controllers
-r--r--r--. 1 atlasprd000 atlasprd 0 Nov 24 14:37 cgroup.events
...


[2]
[atlasprd000@batch1408 _var_lib_condor_execute_slot1_8@xxxxxxxxxxxxxxxxxxxxxxx]$ mkdir _var_lib_condor_execute_slot1_8@xxxxxxxxxxxxxxxxxxxxxxx/thartmanwasalsohere.d mkdir: cannot create directory â_var_lib_condor_execute_slot1_8@xxxxxxxxxxxxxxxxxxxxxxx/thartmanwasalsohere.dâ: Permission denied


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature