Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Use of cgroups

Date: Mon, 28 Aug 2023 03:56:56 +0000
From: Peter Ellevseth <Peter.Ellevseth@xxxxxxxxxx>
Subject: [HTCondor-users] Use of cgroups

Hi all

I am struggling to understand how the cgroup mechanism affects my jobs. I have a added a new fresh node to our cluster. I have starting a lot of jobs on it, but all of sudden it starts killing my jobs. I have traced it back to the OOM killer. However, the execute machine has 250GB of memory and my jobs are not using close to that.

I wanted to try to tune the oom-killer, but I can't seem to find the relevant services (systemd-oomd, OS is ubuntu 22.04). Also haven't found out how to disable it.

Right now I am able to run about 40 (out of 48 cores) jobs. Each use about 0.5% of total memory. When I submit more jobs, the oom-killer steps in and kills them.

I am noticing that the OS seems to be using a lot of swap even when there is a lot physical memory available.

Are there any knobs in condor I can tune to aid with this?

Peter Ellevseth
Principal Advisor / Principal Advisor
	+47 93 43 56 01 / +47 73 90 05 00
	peter.ellevseth@xxxxxxxxxx
	safetec.no

Follow-Ups:
- Re: [HTCondor-users] Use of cgroups
  - From: Marco van Zwetselaar
- Re: [HTCondor-users] Use of cgroups
  - From: Thomas Hartmann

Prev by Date: Re: [HTCondor-users] StartLog: Failed to authenticate
Next by Date: Re: [HTCondor-users] Use of cgroups
Previous by thread: Re: [HTCondor-users] Token directory for service account
Next by thread: Re: [HTCondor-users] Use of cgroups
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

[HTCondor-users] Use of cgroups