Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] More thoughts on memory limits
- Date: Wed, 18 Dec 2024 01:34:06 +0000
- From: "Bockelman, Brian" <BBockelman@xxxxxxxxxxxxx>
- Subject: Re: [HTCondor-users] More thoughts on memory limits
Awesome -- seems like this should be simple enough to reproduce your findings! We are on the case.
Thanks for helping with this,
Brian
> On Dec 16, 2024, at 3:58âAM, Beyer, Christoph <christoph.beyer@xxxxxxx> wrote:
>
> Hi,
>
> my pleasure to be a pain in your !$#$^@& ;)
>
> I use the stress binary that comes with EL9 :
>
> stress-1.0.4-29.el9.x86_64
>
> [root@batch1074 ~]# /usr/bin/stress --vm 1 --vm-bytes 1000M -t 300
>
> [root@batch1074 ~]# ps -ef | grep stress
> root 163473 163391 0 10:12 pts/3 00:00:00 /usr/bin/stress --vm 1 --vm-bytes 1000M -t 300
> root 163474 163473 96 10:12 pts/3 00:01:09 /usr/bin/stress --vm 1 --vm-bytes 1000M -t 300
> root 163593 163391 0 10:13 pts/3 00:00:00 grep --color=auto stress
>
> [root@batch1074 ~]# grep -i VMpeak /proc/163474/status
> VmPeak: 1027524 kB
>
> Sending 100 jobs with this command reveals:
>
> [chbeyer@pal91]~/htcondor/testjobs% condor_history 3840735 -af memoryusage
> 367
> 1221
> 733
> 977
> 123
> 269
> 1221
> 733
> 1465
> 977
> 367
> 464
> 977
> 1221
> 1221
> 98
> 367
> 245
> 196
> 733
> 733
> 367
> 733
> 220
> 464
> 733
> 977
> 342
> 98
> 733
> 733
> 1221
> 733
> 977
> 977
> 977
> 30
> 367
> 416
> 391
> 391
> 1221
> 733
> 1221
> 1221
> 318
> 733
> 733
> 977
> 977
> 391
> 147
> 1221
> 1709
> 733
> 1221
> 977
> 1221
> 367
> 245
> 342
> 977
> 440
> 1221
> 977
> 416
> 733
> 489
> 977
> 977
> 733
> 733
> 1221
> 416
> 733
> 269
> 733
> 977
> 98
> 733
> 1221
> 220
> 733
> 733
> 196
> 977
> 342
> 489
> 342
> 464
> 293
> 977
> 733
> 733
> 440
> 464
> 733
> 440
>
> Which seems preetty random to me ;)
>
> Best
> christoph
>
> --
> Christoph Beyer
> DESY Hamburg
> IT-Department
>
> Notkestr. 85
> Building 02b, Room 009
> 22607 Hamburg
>
> phone:+49-(0)40-8998-2317
> mail: christoph.beyer@xxxxxxx
>
> ----- UrsprÃngliche Mail -----
> Von: "Brian Bockelman" <BBockelman@xxxxxxxxxxxxx>
> An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
> Gesendet: Freitag, 13. Dezember 2024 22:25:49
> Betreff: Re: [HTCondor-users] More thoughts on memory limits
>
> Hi Christoph!
>
> Thanks for being persistent on this...
>
> FWIW, here's the relevant lines of code (adjust for your favorite git tag as needed):
>
> https://github.com/htcondor/htcondor/blob/b3053ce78ae3daa2fbaace98db7a77d2839c867e/src/condor_utils/proc_family_direct_cgroup_v2.cpp#L731-L775
>
> Note that HTCondor is using, unless you change the default config:
>
> memory.current - inactive_file - inactive_anon
>
> I took a random statistical sampling of a single job and found:
>
> memory.current: 6313439232
> inactive_file: 4170027008
> inactive_anon: 1946050560
>
> So, the default code currently claims the job's usage is currently 197361664.
>
> Interestingly, diving through the source code, the cgroup V1 code (back in the 10.x series) used the following formula:
>
> total_rss + total_mapped_file + total_shmem
>
> If I did the equivalent calculation on the same job, I get:
>
> RSS: 1946034176
> Mapped file: 349024256
> shmem: 53248
>
> for a total of 2295111680.
>
> So... the plot thickens! It seems that we are currently reading out significantly different counters between the old code and new code. For example, "inactive_anon" is memory that your job has allocated but hasn't touched "in a while" (the definition of "in a while" is quite complicated). To me, it sounds like such memory *should* be charged to the job: because it's allocated by the process, even though it's unused (unlike the page cache) the kernel cannot reclaim the memory when needed.
>
> I'd personally lean more toward the old definition here. Greg, thoughts?
>
> Christoph -- great detective work. For the sample you copy below, what does HTCondor report?
>
> Brian
>
> PS -- if we want to do an "apples to apples" comparison, what binary do you use for "stress"? Would love to see it reproduced on this side of the ocean.
>
>> On Dec 12, 2024, at 6:19âAM, Beyer, Christoph <christoph.beyer@xxxxxxx> wrote:
>>
>> Hi Brian et al,
>>
>> sorryfor the slight delay, I made some more tests on the memory issue - here is what I do:
>>
>> - start a steady memory consumption job (stress binary, consumes 1gb of mem roughly)
>> - on the worker I read the memory consumption from /proc (Vmsize) and CGROUP (memory.current) in a 10 sec interval
>>
>> As you can see the memory.current is relatively useless/oscillating wildly:
>>
>> It seems to me there goes a lot more dynamic into it than we need (?)
>>
>> Thu Dec 12 01:16:34 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 788660224
>> Thu Dec 12 01:16:44 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 412344320
>> Thu Dec 12 01:16:54 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 7049216
>> Thu Dec 12 01:17:04 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 452481024
>> Thu Dec 12 01:17:14 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 496685056
>> Thu Dec 12 01:17:24 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 752803840
>> Thu Dec 12 01:17:34 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 702517248
>> Thu Dec 12 01:17:44 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 702472192
>> Thu Dec 12 01:17:54 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 1042812928
>> Thu Dec 12 01:18:04 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 206536704
>> Thu Dec 12 01:18:14 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 91041792
>>
>> Best
>> christoph
>>
>> --
>> Christoph Beyer
>> DESY Hamburg
>> IT-Department
>>
>> Notkestr. 85
>> Building 02b, Room 009
>> 22607 Hamburg
>>
>> phone:+49-(0)40-8998-2317
>> mail: christoph.beyer@xxxxxxx
>>
>> ----- UrsprÃngliche Mail -----
>> Von: "Brian Bockelman" <BBockelman@xxxxxxxxxxxxx>
>> An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
>> Gesendet: Mittwoch, 4. Dezember 2024 15:14:44
>> Betreff: Re: [HTCondor-users] More thoughts on memory limits
>>
>> Hi Christoph,
>>
>> From my relatively hazy memory, here's what I think the history is --
>>
>> - The original design, maybe a decade ago, was to use memory.peak.
>> - Using memory.peak was fairly quickly reverted because it was counting various things that individuals found surprising (such as page cache).
>> - Up until 2024, the memory usage was based on the largest recorded value of memory.current which was polled every few seconds.
>> - During the cgroupsv2 transition, another attempt to go to memory.peak was made (esp. as the measurements by the kernel were slightly different).
>> - The second attempt at memory.peak was also reverted -- the pinch point this time was handling of processes that couldn't be killed (which are likely from prior jobs but still affecting the peak memory measurement of the current jobs).
>> - So we now poll memory.current and record the peak value; this time using cgroupsv2 interfaces instead of v1.
>>
>> So, what you see should today *should* be fairly close in spirit to the "max memory usage" recorded in 2023 (that is, it's approximately the maximum recorded value of memory.current polled every 5 seconds across the job lifetime). If that's not the behavior being observed (esp. if you see MemoryUsage ever go *down*), then that's indeed a horribly surprising bug.
>>
>> If you wanted to see the current memory usage of the job, we would have to add a new attribute to show that!
>>
>> Hope the trip down memory lane is useful,
>>
>> Brian
>>
>>> On Dec 4, 2024, at 12:10âAM, Beyer, Christoph <christoph.beyer@xxxxxxx> wrote:
>>>
>>> Hi,
>>>
>>> we definetely need the broken slot code asap as we deal endlessly with unkillable job executables. I just planned this morning to wine about it here ;)
>>>
>>> We even more deadly need the max memory usage back into the job-classadds and history - couldn't you just add a new classadd like memory.current and leave the old one as is ?
>>>
>>> Best
>>> christoph
>>>
>>> --
>>> Christoph Beyer
>>> DESY Hamburg
>>> IT-Department
>>>
>>> Notkestr. 85
>>> Building 02b, Room 009
>>> 22607 Hamburg
>>>
>>> phone:+49-(0)40-8998-2317
>>> mail: christoph.beyer@xxxxxxx
>>>
>>> ----- UrsprÃngliche Mail -----
>>> Von: "Greg Thain via HTCondor-users" <htcondor-users@xxxxxxxxxxx>
>>> An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
>>> CC: "Greg Thain" <gthain@xxxxxxxxxxx>
>>> Gesendet: Montag, 2. Dezember 2024 23:59:02
>>> Betreff: Re: [HTCondor-users] More thoughts on memory limits
>>>
>>> On 12/2/24 10:10 AM, Beyer, Christoph wrote:
>>>> Hi,
>>>>
>>>> memory.current might be interesting for someone but memory.peak could nonetheless go into another job classadd - not having access to it makes memory management pretty much impossible on many levels ?
>>>
>>>
>>> Note that what happens is that HTCondor today polls the memory.current,
>>> and keeps the peak value internally, and reports that peak in the job
>>> ad. The polling frequency is controllers by the knob
>>> STARTER_UPDATE_INTERVAL.
>>>
>>> We are adding support for the notion of a "broken" slot, so that if
>>> there is an unkillable process, the slot will go into the "broken"
>>> state. When this goes in, I think we can go back to using the
>>> cgroup.peak memory usage and reporting that.
>>>
>>>
>>> -greg
>>>
>>>
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>>
>>> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/
>>>
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>>
>>> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/
>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>>
>> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>>
>> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
>
> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
>
> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/