[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] ENOENT writing to cgroup.subtree_control, but file exists



Hi Max,

could this be related to SE-Linux ? 

Best
christoph

-- 
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx

----- UrsprÃngliche Mail -----
Von: "Max Fischer, SCC" <max.fischer@xxxxxxx>
An: "Condor-Users Mail List" <condor-users@xxxxxxxxxxx>
Gesendet: Montag, 8. Juli 2024 13:31:20
Betreff: [HTCondor-users] ENOENT writing to cgroup.subtree_control,	but file exists

Hi all,

We just discovered that HTCondor consistently fails to create cgroups (v2) for jobs in our cluster. However, Iâm at a loss of what is causing this.

The log StarterLog [0] first reports the htcondor cgroup as writeable, but then fails writing to it with ENOENT which seems to cause every other cgroup setup to fail as well. When I check the htcondor root cgroup tree file, it exists since ages [1].

Is the error report masking some other error that makes this fail? Are there any obvious steps we might have missed when preparing cgroups?

Weâre on RHEL8 (yes, we missed the RHEL9 train) and are running HTCondor 23.7.2. It looks like the relevant code [2] hasnât been changed in 23.8.1 so we havenât considered updating as a mitigation.

Cheers,
Max

[0] /var/log/condor/StarterLog.slot1_14
07/08/24 04:04:38 (pid:738500) (D_ALWAYS) Checking to see if htcondor is a writeable cgroup
07/08/24 04:04:38 (pid:738500) (D_ALWAYS)     Cgroup /htcondor is useable
...
07/08/24 04:04:38 (pid:738504) (D_ALWAYS) ProcFamilyDirectCgroupV2::track_family_via_cgroup error writing to /sys/fs/cgroup/htcondor/cgroup.subtree_control: No such file or directory
07/08/24 04:04:38 (pid:738504) (D_ALWAYS) Error setting cgroup cpu weight of 800 in cgroup /sys/fs/cgroup/htcondor/condor_tmp_condor_execute_slot1_14@xxxxxxxxxxxxxxxxxxxxx: No such file or directory
07/08/24 04:04:38 (pid:738504) (D_ALWAYS) Error enabling per-cgroup oom killing: 2 (No such file or directory)

[1] ls -l /sys/fs/cgroup/htcondor/cgroup.subtree_control
-rw-r--r-- 1 root root 0 Jun 21 18:16 /sys/fs/cgroup/htcondor/cgroup.subtree_control

[2] https://github.com/htcondor/htcondor/blob/8cf018d14d7e198ffb1f3535326a3d8a22b52186/src/condor_utils/proc_family_direct_cgroup_v2.cpp#L181-L198


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/