Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Preempt a job when memory usage to higher than requested, only if total system memory is getting low
- Date: Thu, 16 Feb 2023 08:09:31 +0100 (CET)
- From: "Beyer, Christoph" <christoph.beyer@xxxxxxx>
- Subject: Re: [HTCondor-users] Preempt a job when memory usage to higher than requested, only if total system memory is getting low
Hi,
as Todd says the RESERVED_MEMORY knob is vital toreserve a couple of MB's to be not accessible for HTC ...
I think we have set it to 2000MB ...
Best
christoph
--
Christoph Beyer
DESY Hamburg
IT-Department
Notkestr. 85
Building 02b, Room 009
22607 Hamburg
phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx
----- UrsprÃngliche Mail -----
Von: "Charles Goyard" <cgoyard@xxxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Mittwoch, 15. Februar 2023 21:42:33
Betreff: Re: [HTCondor-users] Preempt a job when memory usage to higher than requested, only if total system memory is getting low
Hi Christoph,
thanks for the suggestion, but it turns out we have
CGROUP_MEMORY_LIMIT_POLICY = soft already (been like this for years, it
seems).
What happens is that once the memory is exhausted, Linux' OOM killer
kicks in, and sometimes kills the job that was legitimaly using the RAM
amount it requested (We tend to have jobs that use from 10 to 128GB of
memory, so RAM gets full rather quickly).
I'm not sure Condor can hint the OOM at making an informed decision
about who to kill first.
I'll check out the documentation about this SYSTEM_PERIODIC_* thing, I'm
not sure I understand how it works.
Thanks !
--
Charles
Beyer, Christoph wrote:
> Hi Charles,
>
> did you check the option CGROUP_MEMORY_LIMIT_POLICY - Ithink it does pretty much what you want if you set it to soft.
>
> The configuration variable CGROUP_MEMORY_LIMIT_POLICY controls this. If CGROUP_MEMORY_LIMIT_POLICY is set
> to the string hard, the hard limit will be set to the slot size, and the soft limit to 90% of the slot size.. If set to soft, the
> soft limit will be set to the slot size and the hard limit will be set to the memory size of the whole startd. By default, this
> whole size is the detected memory the size, minus RESERVED_MEMORY. Or, if MEMORY is defined, that value is
> used..
>
>
> We use the system periodic hold to put a limit on it (3 x times requested memory is tolerated)
>
> HoldOverMem = (ifThenElse(ResidentSetSize =!= UNDEFINED, ResidentSetSize,1) > 3000 * RequestMemory)
> HoldOverMemReason = "Memory usage higher than 3 x requested memory"
>
> SYSTEM_PERIODIC_HOLD = $(HoldOverMem)
> SYSTEM_PERIODIC_HOLD_REASON = $(HoldOverMemReason)
>
> Best
> christoph
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/