Re: [HTCondor-users] More thoughts on memory limits

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

946757.0 jtho long data-theorie-jthoe 11/21 07:19 1 0.999 32.0 GB 732.4 MB 732.4 MB C 13:03:43 13:10:42 wn-lot-045

946741.0 jtho long data-theorie-jthoe 11/21 06:55 1 1.000 32.0 GB 732.4 MB 732.4 MB C 13:03:53 13:04:36 wn-pijl-007

946581.0 jtho long data-theorie-jthoe 11/21 05:59 1 1.000 32.0 GB 732.4 MB 732.4 MB C 15:59:24 15:59:40 wn-lot-002

946889.0 jtho long data-theorie-jthoe 11/21 05:59 1 1.000 32.0 GB 9.8 MB 9.8 MB C 0 10:38:29 wn-lot-060

946732.0 jtho long data-theorie-jthoe 11/21 05:45 1 0.999 32.0 GB 732.4 MB 732.4 MB C 12:20:45 12:21:21 wn-pijl-004

946842.0 jtho long data-theorie-jthoe 11/21 05:23 1 0.997 32.0 GB 1.2 GB 1.4 GB C 10:38:52 10:41:09 wn-pijl-001

946440.0 jtho long data-theorie-jthoe 11/21 05:04 1 0.999 32.0 GB 1.2 GB 1.4 GB C 17:29:34 17:30:26 wn-pijl-006

You can see that for one of these lines, the CPU_TIME is zero, and the memory usage is significantly lower. Iâve seen this with my own test jobs, and looking at what the test jobs themselves (internally) report, they have the normal usage - HTCondor is somehow not always getting the right usage numbers.

On 21 Nov 2024, at 10:11, Carles Acosta <cacosta@xxxxxx> wrote:

Dear all,

We are running 23.10.1 version in all our EPs. We took the opportunity to add again a memory limit:

CGROUP_IGNORE_CACHE_MEMORY = True
MEMORY_EXCEEDED = (MemoryUsage isnt undefined && MemoryUsage > Memory*3)
use POLICY : WANT_HOLD_IF(MEMORY_EXCEEDED, 102, peak memory usage exceeded requested memory by 3 times)

The limit is generous, 3 times, because we first want to test how this evolves.

After 3 weeks, it is clear that we do not have the huge overestimation of memory usage we saw in the past. However, it seems that the MEMORY_EXCEEDED _expression_ is generating some false positives. For instance, the same job was submitted two times, the first time it shows a memory usage of 14 GB, and the second time, it shows a regular memory usage of 4 GB. I understand that this is the cgroups memory.peak, right? For CentOs7 or cgroupsv1, was the same max value considered (memory.max_usage_in_bytes) or the current (memory.usage_in_bytes)?

Does any other site use a limit like this? What is your experience?

Best regards,

Carles

--
Carles Acosta i Silva
PIC (Port d'InformaciÃ CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10
http://www.pic.es
AvÃs - Aviso - Legal Notice: http://legal.ifae.es

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/

Mailing List Archives

Authenticated access

Re: [HTCondor-users] More thoughts on memory limits