[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Preempt a job when memory usage to higher than requested, only if total system memory is getting low



Hi Christoph,

thanks for the suggestion, but it turns out we have
CGROUP_MEMORY_LIMIT_POLICY = soft already (been like this for years, it
seems).

What happens is that once the memory is exhausted, Linux' OOM killer
kicks in, and sometimes kills the job that was legitimaly using the RAM
amount it requested (We tend to have jobs that use from 10 to 128GB of
memory, so RAM gets full rather quickly).

I'm not sure Condor can hint the OOM at making an informed decision
about who to kill first.


I'll check out the documentation about this SYSTEM_PERIODIC_* thing, I'm
not sure I understand how it works.

Thanks !

-- 
Charles


Beyer, Christoph wrote:
> Hi Charles,
> 
> did you check the option CGROUP_MEMORY_LIMIT_POLICY - Ithink it does pretty much what you want if you set it to soft. 
> 
> The configuration variable CGROUP_MEMORY_LIMIT_POLICY controls this. If CGROUP_MEMORY_LIMIT_POLICY is set
> to the string hard, the hard limit will be set to the slot size, and the soft limit to 90% of the slot size.. If set to soft, the
> soft limit will be set to the slot size and the hard limit will be set to the memory size of the whole startd. By default, this
> whole size is the detected memory the size, minus RESERVED_MEMORY. Or, if MEMORY is defined, that value is
> used..
> 
> 
> We use the system periodic hold to put a limit on it (3 x times requested memory is tolerated) 
> 
> HoldOverMem = (ifThenElse(ResidentSetSize =!= UNDEFINED, ResidentSetSize,1) > 3000 * RequestMemory)
> HoldOverMemReason = "Memory usage higher than 3 x requested memory"
> 
> SYSTEM_PERIODIC_HOLD = $(HoldOverMem)
> SYSTEM_PERIODIC_HOLD_REASON = $(HoldOverMemReason)
> 
> Best
> christoph