[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Preempt a job when memory usage to higher than requested, only if total system memory is getting low



Hi Charles et all,

Some quick thoughts re the below noisy neighbor / cgroup memory issues:

1. Are you running on Debian or Ubuntu?  If yes, be a bug re enforcement of memory limits on those Linux distros was just fixed last month in HTCondor v10.0.1+ in LTS channel, and HTCondor v10.2.0+ in feature channel.   See the release notes, or the nerdy details here: https://opensciencegrid.atlassian.net/browse/HTCONDOR-1466, but the upshot is nothing related to memory enforcement was working correctly on those distros until recently.  (other distros like Centos, Fedora, Red Hat were fine).

2. Personally I am not a fan of "CGROUP_MEMORY_LIMIT_POLICY=soft".  Setting the policy to "soft" will always result in badly configured jobs penalizing good citizens.  Even worse IMHO is it leads to non-deterministic and unpredictable behavior from the perspective of  end users, e.g. "hey admin, my job ran just fine to completion last week but it got killed this week, why???".    Better to leave CGROUP_MEMORY_LIMIT_POLICY=hard (the default), and make things clear and predictable for users: if your job uses more memory than you requested at submit time, it will be killed. 

3. See the Manual for the CGROUP_MEMORY_LIMIT_POLICY knob to understand how the soft and hard limits are set (you can also customize them yourself if you are an expert).  The default 'hard' policy sets the hard limit at the size of the slot and the soft limit at 90% of the size of the slot.

4. By default, HTCondor should attempt to direct the OOM away from jobs that are using less than 90% of the cgroup soft limit on the slot --- considering the default soft limit is 90% of the slot memory (see above), this effectively means if the job is at or below 80% of its requested memory it wont be killed by the OOM.  Why go with 90% of the soft limit  -vs- 100% you ask?  Unfortunately, it appears that the OOM killer sees a different (larger) value for usage than reported by cgroups.... (not sure why, perhaps it includes memory used in kernel data structures etc).  For the nerds, here is where that happens:
https://github.com/htcondor/htcondor/blob/main/src/condor_starter.V6.1/vanilla_proc.cpp#L1275-L1311

5. Cgroup memory limits are limits, not reservations.   By default, HTCondor considers all the physical memory of your machine as available to be used by HTCondor jobs. If some other services/processes outside of HTCondor is pulling the "memory rug" out from underneath the startd, all bets are off and who knows what the OOM will kill.  To tell HTCondor about memory consumption for services running on the server outside of HTCondor, you really must use the config knob RESERVED_MEMORY.  Common memory stealing culprits are other daemons running on the machine (web proxy services, puppet/chef, etc), and/or shared filesystem services including FUSE mounts etc. 

Hope the above ramblings are helpful,
Todd
 

On 2/15/2023 12:49 PM, Charles Goyard wrote:
Hi,

now that we have dynamic slots on our pool, we enjoy the noisy neighbor problem.

That is, some users correctly set their request_memory parameter, and some don't. This can lead to an unfair situation where badly configured jobs penalize the good citizens.

I found out the configuration template to evict jobs that use more than requested, and I'm planning to put is to good use. But let's add a grain of salt.

What I would like to achieve, is to allow jobs to eat more cake that expected, as long as there is no memory pressure at the system level (a bit like how group quota surplus work).

How can I come up with an _expression_ that evaluates the total free (or used) memory on a compute node? Can I gather memory information from other slots?

Something like :

PREEMPT=( (MemoryUsage > Memory) && ( SumOfMemoryUsageAcrossSlots > ( TotalComputerMemory * 0.95 ) ) )


Thanks !



-- 
Todd Tannenbaum <tannenba@xxxxxxxxxxx>  University of Wisconsin-Madison
Center for High Throughput Computing    Department of Computer Sciences
Calendar: https://tinyurl.com/yd55mtgd  1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                   Madison, WI 53706-1685