[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] More thoughts on memory limits



Awesome -- seems like this should be simple enough to reproduce your findings!  We are on the case.

Thanks for helping with this,

Brian

> On Dec 16, 2024, at 3:58âAM, Beyer, Christoph <christoph.beyer@xxxxxxx> wrote:
> 
> Hi,
> 
> my pleasure to be a pain in your !$#$^@& ;) 
> 
> I use the stress binary that comes with EL9 : 
> 
> stress-1.0.4-29.el9.x86_64
> 
> [root@batch1074 ~]# /usr/bin/stress --vm 1 --vm-bytes 1000M -t 300
> 
> [root@batch1074 ~]# ps -ef | grep stress
> root      163473  163391  0 10:12 pts/3    00:00:00 /usr/bin/stress --vm 1 --vm-bytes 1000M -t 300
> root      163474  163473 96 10:12 pts/3    00:01:09 /usr/bin/stress --vm 1 --vm-bytes 1000M -t 300
> root      163593  163391  0 10:13 pts/3    00:00:00 grep --color=auto stress
> 
> [root@batch1074 ~]# grep -i VMpeak /proc/163474/status 
> VmPeak:	 1027524 kB
> 
> Sending 100 jobs with this command reveals: 
> 
> [chbeyer@pal91]~/htcondor/testjobs% condor_history  3840735 -af memoryusage
> 367
> 1221
> 733
> 977
> 123
> 269
> 1221
> 733
> 1465
> 977
> 367
> 464
> 977
> 1221
> 1221
> 98
> 367
> 245
> 196
> 733
> 733
> 367
> 733
> 220
> 464
> 733
> 977
> 342
> 98
> 733
> 733
> 1221
> 733
> 977
> 977
> 977
> 30
> 367
> 416
> 391
> 391
> 1221
> 733
> 1221
> 1221
> 318
> 733
> 733
> 977
> 977
> 391
> 147
> 1221
> 1709
> 733
> 1221
> 977
> 1221
> 367
> 245
> 342
> 977
> 440
> 1221
> 977
> 416
> 733
> 489
> 977
> 977
> 733
> 733
> 1221
> 416
> 733
> 269
> 733
> 977
> 98
> 733
> 1221
> 220
> 733
> 733
> 196
> 977
> 342
> 489
> 342
> 464
> 293
> 977
> 733
> 733
> 440
> 464
> 733
> 440
> 
> Which seems preetty random to me ;) 
> 
> Best
> christoph
> 
> -- 
> Christoph Beyer
> DESY Hamburg
> IT-Department
> 
> Notkestr. 85
> Building 02b, Room 009
> 22607 Hamburg
> 
> phone:+49-(0)40-8998-2317
> mail: christoph.beyer@xxxxxxx
> 
> ----- UrsprÃngliche Mail -----
> Von: "Brian Bockelman" <BBockelman@xxxxxxxxxxxxx>
> An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
> Gesendet: Freitag, 13. Dezember 2024 22:25:49
> Betreff: Re: [HTCondor-users] More thoughts on memory limits
> 
> Hi Christoph!
> 
> Thanks for being persistent on this...
> 
> FWIW, here's the relevant lines of code (adjust for your favorite git tag as needed):
> 
> https://github.com/htcondor/htcondor/blob/b3053ce78ae3daa2fbaace98db7a77d2839c867e/src/condor_utils/proc_family_direct_cgroup_v2.cpp#L731-L775
> 
> Note that HTCondor is using, unless you change the default config:
> 
> memory.current - inactive_file - inactive_anon
> 
> I took a random statistical sampling of a single job and found:
> 
> memory.current: 6313439232
> inactive_file: 4170027008
> inactive_anon: 1946050560
> 
> So, the default code currently claims the job's usage is currently 197361664.
> 
> Interestingly, diving through the source code, the cgroup V1 code (back in the 10.x series) used the following formula:
> 
> total_rss + total_mapped_file + total_shmem
> 
> If I did the equivalent calculation on the same job, I get:
> 
> RSS: 1946034176
> Mapped file: 349024256
> shmem: 53248
> 
> for a total of 2295111680.
> 
> So... the plot thickens!  It seems that we are currently reading out significantly different counters between the old code and new code.  For example, "inactive_anon" is memory that your job has allocated but hasn't touched "in a while" (the definition of "in a while" is quite complicated).  To me, it sounds like such memory *should* be charged to the job: because it's allocated by the process, even though it's unused (unlike the page cache) the kernel cannot reclaim the memory when needed.
> 
> I'd personally lean more toward the old definition here.  Greg, thoughts?
> 
> Christoph -- great detective work.  For the sample you copy below, what does HTCondor report?
> 
> Brian
> 
> PS -- if we want to do an "apples to apples" comparison, what binary do you use for "stress"?  Would love to see it reproduced on this side of the ocean.
> 
>> On Dec 12, 2024, at 6:19âAM, Beyer, Christoph <christoph.beyer@xxxxxxx> wrote:
>> 
>> Hi Brian et al,
>> 
>> sorryfor the slight delay, I made some more tests on the memory issue - here is what I do: 
>> 
>> - start a steady memory consumption job (stress binary, consumes 1gb of mem roughly) 
>> - on the worker I read the memory consumption from /proc (Vmsize) and CGROUP (memory.current) in a 10 sec interval
>> 
>> As you can see the memory.current is relatively useless/oscillating wildly: 
>> 
>> It seems to me there goes a lot more dynamic into it than we need (?)
>> 
>> Thu Dec 12 01:16:34 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 788660224
>> Thu Dec 12 01:16:44 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 412344320
>> Thu Dec 12 01:16:54 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 7049216
>> Thu Dec 12 01:17:04 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 452481024
>> Thu Dec 12 01:17:14 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 496685056
>> Thu Dec 12 01:17:24 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 752803840
>> Thu Dec 12 01:17:34 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 702517248
>> Thu Dec 12 01:17:44 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 702472192
>> Thu Dec 12 01:17:54 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 1042812928
>> Thu Dec 12 01:18:04 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 206536704
>> Thu Dec 12 01:18:14 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 91041792
>> 
>> Best
>> christoph
>> 
>> -- 
>> Christoph Beyer
>> DESY Hamburg
>> IT-Department
>> 
>> Notkestr. 85
>> Building 02b, Room 009
>> 22607 Hamburg
>> 
>> phone:+49-(0)40-8998-2317
>> mail: christoph.beyer@xxxxxxx
>> 
>> ----- UrsprÃngliche Mail -----
>> Von: "Brian Bockelman" <BBockelman@xxxxxxxxxxxxx>
>> An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
>> Gesendet: Mittwoch, 4. Dezember 2024 15:14:44
>> Betreff: Re: [HTCondor-users] More thoughts on memory limits
>> 
>> Hi Christoph,
>> 
>> From my relatively hazy memory, here's what I think the history is --
>> 
>> - The original design, maybe a decade ago, was to use memory.peak.
>> - Using memory.peak was fairly quickly reverted because it was counting various things that individuals found surprising (such as page cache).
>> - Up until 2024, the memory usage was based on the largest recorded value of memory.current which was polled every few seconds.
>> - During the cgroupsv2 transition, another attempt to go to memory.peak was made (esp. as the measurements by the kernel were slightly different).
>> - The second attempt at memory.peak was also reverted -- the pinch point this time was handling of processes that couldn't be killed (which are likely from prior jobs but still affecting the peak memory measurement of the current jobs).
>> - So we now poll memory.current and record the peak value; this time using cgroupsv2 interfaces instead of v1.
>> 
>> So, what you see should today *should* be fairly close in spirit to the "max memory usage" recorded in 2023 (that is, it's approximately the maximum recorded value of memory.current polled every 5 seconds across the job lifetime).  If that's not the behavior being observed (esp. if you see MemoryUsage ever go *down*), then that's indeed a horribly surprising bug.
>> 
>> If you wanted to see the current memory usage of the job, we would have to add a new attribute to show that!
>> 
>> Hope the trip down memory lane is useful,
>> 
>> Brian
>> 
>>> On Dec 4, 2024, at 12:10âAM, Beyer, Christoph <christoph.beyer@xxxxxxx> wrote:
>>> 
>>> Hi,
>>> 
>>> we definetely need the broken slot code asap as we deal endlessly with unkillable job executables. I just planned this morning to wine about it here ;) 
>>> 
>>> We even more deadly need the max memory usage back into the job-classadds and history - couldn't you just add a new classadd like memory.current and leave the old one as is ? 
>>> 
>>> Best
>>> christoph 
>>> 
>>> -- 
>>> Christoph Beyer
>>> DESY Hamburg
>>> IT-Department
>>> 
>>> Notkestr. 85
>>> Building 02b, Room 009
>>> 22607 Hamburg
>>> 
>>> phone:+49-(0)40-8998-2317
>>> mail: christoph.beyer@xxxxxxx
>>> 
>>> ----- UrsprÃngliche Mail -----
>>> Von: "Greg Thain via HTCondor-users" <htcondor-users@xxxxxxxxxxx>
>>> An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
>>> CC: "Greg Thain" <gthain@xxxxxxxxxxx>
>>> Gesendet: Montag, 2. Dezember 2024 23:59:02
>>> Betreff: Re: [HTCondor-users] More thoughts on memory limits
>>> 
>>> On 12/2/24 10:10 AM, Beyer, Christoph wrote:
>>>> Hi,
>>>> 
>>>> memory.current might be interesting for someone but memory.peak could nonetheless go into another job classadd - not having access to it makes memory management pretty much impossible on many levels ?
>>> 
>>> 
>>> Note that what happens is that HTCondor today polls the memory.current, 
>>> and keeps the peak value internally, and reports that peak in the job 
>>> ad.  The polling frequency is controllers by the knob 
>>> STARTER_UPDATE_INTERVAL.
>>> 
>>> We are adding support for the notion of a "broken" slot, so that if 
>>> there is an unkillable process, the slot will go into the "broken" 
>>> state.  When this goes in, I think we can go back to using the 
>>> cgroup.peak memory usage and reporting that.
>>> 
>>> 
>>> -greg
>>> 
>>> 
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> 
>>> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/
>>> 
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> 
>>> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/
>> 
>> 
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> 
>> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/
>> 
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> 
>> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> 
> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> 
> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/