This may be related to an issue weâre seeing here with capturing resource usage. See e.g. the following:
946757.0  jtho    long data-theorie-jthoe 11/21 07:19  1 0.999 32.0 GB  732.4 MB  732.4 MB C   Â13:03:43   13:10:42 wn-lot-045
946741.0  jtho    long data-theorie-jthoe 11/21 06:55  1 1.000 32.0 GB  732.4 MB  732.4 MB C   Â13:03:53   13:04:36 wn-pijl-007
946581.0  jtho    long data-theorie-jthoe 11/21 05:59  1 1.000 32.0 GB  732.4 MB  732.4 MB C   Â15:59:24   15:59:40 wn-lot-002
946889.0  jtho    long data-theorie-jthoe 11/21 05:59  1 1.000 32.0 GB   9.8 MB   9.8 MB C       0   10:38:29 wn-lot-060
946732.0  jtho    long data-theorie-jthoe 11/21 05:45  1 0.999 32.0 GB  732.4 MB  732.4 MB C   Â12:20:45   12:21:21 wn-pijl-004
946842.0  jtho    long data-theorie-jthoe 11/21 05:23  1 0.997 32.0 GB   1.2 GB   1.4 GB C   Â10:38:52   10:41:09 wn-pijl-001
946440.0  jtho    long data-theorie-jthoe 11/21 05:04  1 0.999 32.0 GB   1.2 GB   1.4 GB C   Â17:29:34   17:30:26 wn-pijl-006
You can see that for one of these lines, the CPU_TIME is zero, and the memory usage is significantly lower. Iâve seen this with my own test jobs, and looking at what the test jobs themselves (internally) report, they have the normal usage - HTCondor is somehow not always getting the right usage numbers.
JT
Dear all,
We are running 23.10.1 version in all our EPs. We took the opportunityÂto add again a memory limit:
CGROUP_IGNORE_CACHE_MEMORY = True
MEMORY_EXCEEDED = (MemoryUsage isnt undefined && MemoryUsage > Memory*3)
use POLICY : WANT_HOLD_IF(MEMORY_EXCEEDED, 102, peak memory usage exceeded requested memory by 3 times)
The limit is generous, 3 times, becauseÂwe first want to test how this evolves.
After 3 weeks, it is clear that we do not have the huge overestimation of memory usage we saw in the past. However, it seems that the MEMORY_EXCEEDED _expression_ is generating some false positives. For instance, the same job wasÂsubmittedÂtwo times, the first time it shows a memory usage of 14 GB, and the second time, it shows a regular memory usage of 4 GB. I understand that this is the cgroups memory.peak, right? For CentOs7 or cgroupsv1, was the same max value considered (memory.max_usage_in_bytes) or the current (memory.usage_in_bytes)?Â
Does any other site use a limit like this? What is your experience?
Best regards,
Carles
-- Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
The archives can be found at:
https://www-auth.cs.wisc.edu/lists/htcondor-users/