[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Zero MemoryUsage after update to 10.x



After much tinkering I have come to the conclusion that the cgroups v2 implementation in HTCondor is buggy (I think it may have been implemented while the API was not entirely stabilized?), so the solution has been to disable cgroups v2, which is enabled by default in Debian 11 (bullseye), and use v1. This is done by specifying a couple of extra arguments to the kernel boot. If you are using GRUB this can be achieved by editing the /etc/default/grub and adding them to the "GRUB_CMDLINE_LINUX" parameter:

GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=false systemd.legacy_systemd_cgroup_controller=false"
...and then running "update-grub". After the next reboot v2 will be disabled. Memory limits are correctly enforced now (whether hard or soft), memory usage is reported correctly and jobs are held (instead of removed) with a proper hold reason.

By the way, while debugging (with v2 enabled) I tried setting "CGROUP_MEMORY_LIMIT_POLICY" to none (the default value) and I still got erroneous OOM kill signals. This might be related to the buggy implementation.

Regards,
Javier Barbero


El 5/2/23 a las 6:36, Greg Thain via HTCondor-users escribiÃ:

On 2/3/23 3:42 AM, Javier Barbero GÃmez wrote:


We are using Debian 11 in all machines and all packages were updated at the same time with HTCondor (which was updated to the latest current release available in the Debian repo, which was 10.2.0) and kernel version 5.10.0-21-amd64

Javier:


Can you send me (directly, off-list) the StarterLog.slotXXX for one of the jobs in question?


Thanks,


-greg


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://urldefense.com/v3/__https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users__;!!D9dNQwwGXtA!V3a2-azkcg_KGa79vS0RzZVPVN1VyHcic8i8MAm_vCCIPDapXFeMGc17pFQ-wUi1-8mMfUkxA_rw9tQ4yK-xg6YFVD6T$
The archives can be found at:
https://urldefense.com/v3/__https://lists.cs.wisc.edu/archive/htcondor-users/__;!!D9dNQwwGXtA!V3a2-azkcg_KGa79vS0RzZVPVN1VyHcic8i8MAm_vCCIPDapXFeMGc17pFQ-wUi1-8mMfUkxA_rw9tQ4yK-xg4Gt10FX$