[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Preempt a job when memory usage to higher than requested, only if total system memory is getting low



Hi Charles,

did you check the option CGROUP_MEMORY_LIMIT_POLICY - Ithink it does pretty much what you want if you set it to soft. 

The configuration variable CGROUP_MEMORY_LIMIT_POLICY controls this. If CGROUP_MEMORY_LIMIT_POLICY is set
to the string hard, the hard limit will be set to the slot size, and the soft limit to 90% of the slot size.. If set to soft, the
soft limit will be set to the slot size and the hard limit will be set to the memory size of the whole startd. By default, this
whole size is the detected memory the size, minus RESERVED_MEMORY. Or, if MEMORY is defined, that value is
used..


We use the system periodic hold to put a limit on it (3 x times requested memory is tolerated) 

HoldOverMem = (ifThenElse(ResidentSetSize =!= UNDEFINED, ResidentSetSize,1) > 3000 * RequestMemory)
HoldOverMemReason = "Memory usage higher than 3 x requested memory"

SYSTEM_PERIODIC_HOLD = $(HoldOverMem)
SYSTEM_PERIODIC_HOLD_REASON = $(HoldOverMemReason)

Best
christoph

-- 
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx

----- UrsprÃngliche Mail -----
Von: "Charles Goyard" <cgoyard@xxxxxxx>
An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
Gesendet: Mittwoch, 15. Februar 2023 19:49:42
Betreff: [HTCondor-users] Preempt a job when memory usage to higher than requested, only if total system memory is getting low

Hi,

now that we have dynamic slots on our pool, we enjoy the noisy neighbor 
problem.

That is, some users correctly set their request_memory parameter, and 
some don't. This can lead to an unfair situation where badly configured 
jobs penalize the good citizens.

I found out the configuration template to evict jobs that use more than 
requested, and I'm planning to put is to good use. But let's add a grain 
of salt.

What I would like to achieve, is to allow jobs to eat more cake that 
expected, as long as there is no memory pressure at the system level (a 
bit like how group quota surplus work).

How can I come up with an expression that evaluates the total free (or 
used) memory on a compute node? Can I gather memory information from 
other slots?

Something like :

PREEMPT=( (MemoryUsage > Memory) && ( SumOfMemoryUsageAcrossSlots > ( 
TotalComputerMemory * 0.95 ) ) )


Thanks !

-- 
Charles
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/