[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] More thoughts on memory limits



Since 23.10.1 HTCondor no longer use memory.peak but instead memory.current

https://htcondor.readthedocs.io/en/latest/version-history/feature-versions-23-x.html#version-23-10-1

If I understand this update correctly: in the past MemoryUsage provided information about maximum used memory and since 23.10.1 this classAd contain the last known memory usage value. So, it no longer make too much sense to look at this value for finished jobs.

Still, if your schedd is "lucky" when evaluating your memory policy _expression_ than memory.current can be at "peak" memory of currently running job (so I guess if job consume huge amount of memory even for fraction of second there is still non-zero probability this _expression_ can be evaluated to true).


btw: I find questionable to re-use existing cgroup slot from previous jobs with stuck processes for a new HTCondor job and I hope that developers comes with cleaner solution in future...


Petr

On 11/21/24 10:11, Carles Acosta wrote:
Dear all,

We are running 23.10.1 version in all our EPs. We took the opportunity to add again a memory limit:

CGROUP_IGNORE_CACHE_MEMORY = True
MEMORY_EXCEEDED = (MemoryUsage isnt undefined && MemoryUsage > Memory*3)
use POLICY : WANT_HOLD_IF(MEMORY_EXCEEDED, 102, peak memory usage exceeded requested memory by 3 times)

The limit is generous, 3 times, because we first want to test how this evolves.

After 3 weeks, it is clear that we do not have the huge overestimation of memory usage we saw in the past. However, it seems that the MEMORY_EXCEEDED _expression_ is generating some false positives. For instance, the same job was submitted two times, the first time it shows a memory usage of 14 GB, and the second time, it shows a regular memory usage of 4 GB. I understand that this is the cgroups memory.peak, right? For CentOs7 or cgroupsv1, was the same max value considered (memory.max_usage_in_bytes) or the current (memory.usage_in_bytes)? 

Does any other site use a limit like this? What is your experience?

Best regards,

Carles

--
Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10
AvÃs - Aviso - Legal Notice:  http://legal.ifae.es

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/