[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] More thoughts on memory limits



Hi Christoph!

Thanks for being persistent on this...

FWIW, here's the relevant lines of code (adjust for your favorite git tag as needed):

https://github.com/htcondor/htcondor/blob/b3053ce78ae3daa2fbaace98db7a77d2839c867e/src/condor_utils/proc_family_direct_cgroup_v2.cpp#L731-L775

Note that HTCondor is using, unless you change the default config:

memory.current - inactive_file - inactive_anon

I took a random statistical sampling of a single job and found:

memory.current: 6313439232
inactive_file: 4170027008
inactive_anon: 1946050560

So, the default code currently claims the job's usage is currently 197361664.

Interestingly, diving through the source code, the cgroup V1 code (back in the 10.x series) used the following formula:

total_rss + total_mapped_file + total_shmem

If I did the equivalent calculation on the same job, I get:

RSS: 1946034176
Mapped file: 349024256
shmem: 53248

for a total of 2295111680.

So... the plot thickens!  It seems that we are currently reading out significantly different counters between the old code and new code.  For example, "inactive_anon" is memory that your job has allocated but hasn't touched "in a while" (the definition of "in a while" is quite complicated).  To me, it sounds like such memory *should* be charged to the job: because it's allocated by the process, even though it's unused (unlike the page cache) the kernel cannot reclaim the memory when needed.

I'd personally lean more toward the old definition here.  Greg, thoughts?

Christoph -- great detective work.  For the sample you copy below, what does HTCondor report?

Brian

PS -- if we want to do an "apples to apples" comparison, what binary do you use for "stress"?  Would love to see it reproduced on this side of the ocean.

> On Dec 12, 2024, at 6:19âAM, Beyer, Christoph <christoph.beyer@xxxxxxx> wrote:
> 
> Hi Brian et al,
> 
> sorryfor the slight delay, I made some more tests on the memory issue - here is what I do: 
> 
> - start a steady memory consumption job (stress binary, consumes 1gb of mem roughly) 
> - on the worker I read the memory consumption from /proc (Vmsize) and CGROUP (memory.current) in a 10 sec interval
> 
> As you can see the memory.current is relatively useless/oscillating wildly: 
> 
> It seems to me there goes a lot more dynamic into it than we need (?)
> 
> Thu Dec 12 01:16:34 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 788660224
> Thu Dec 12 01:16:44 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 412344320
> Thu Dec 12 01:16:54 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 7049216
> Thu Dec 12 01:17:04 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 452481024
> Thu Dec 12 01:17:14 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 496685056
> Thu Dec 12 01:17:24 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 752803840
> Thu Dec 12 01:17:34 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 702517248
> Thu Dec 12 01:17:44 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 702472192
> Thu Dec 12 01:17:54 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 1042812928
> Thu Dec 12 01:18:04 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 206536704
> Thu Dec 12 01:18:14 PM CET 2024 PROC: VmSize: 1027524 kB CGRP: 91041792
> 
> Best
> christoph
> 
> -- 
> Christoph Beyer
> DESY Hamburg
> IT-Department
> 
> Notkestr. 85
> Building 02b, Room 009
> 22607 Hamburg
> 
> phone:+49-(0)40-8998-2317
> mail: christoph.beyer@xxxxxxx
> 
> ----- UrsprÃngliche Mail -----
> Von: "Brian Bockelman" <BBockelman@xxxxxxxxxxxxx>
> An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
> Gesendet: Mittwoch, 4. Dezember 2024 15:14:44
> Betreff: Re: [HTCondor-users] More thoughts on memory limits
> 
> Hi Christoph,
> 
> From my relatively hazy memory, here's what I think the history is --
> 
> - The original design, maybe a decade ago, was to use memory.peak.
> - Using memory.peak was fairly quickly reverted because it was counting various things that individuals found surprising (such as page cache).
> - Up until 2024, the memory usage was based on the largest recorded value of memory.current which was polled every few seconds.
> - During the cgroupsv2 transition, another attempt to go to memory.peak was made (esp. as the measurements by the kernel were slightly different).
> - The second attempt at memory.peak was also reverted -- the pinch point this time was handling of processes that couldn't be killed (which are likely from prior jobs but still affecting the peak memory measurement of the current jobs).
> - So we now poll memory.current and record the peak value; this time using cgroupsv2 interfaces instead of v1.
> 
> So, what you see should today *should* be fairly close in spirit to the "max memory usage" recorded in 2023 (that is, it's approximately the maximum recorded value of memory.current polled every 5 seconds across the job lifetime).  If that's not the behavior being observed (esp. if you see MemoryUsage ever go *down*), then that's indeed a horribly surprising bug.
> 
> If you wanted to see the current memory usage of the job, we would have to add a new attribute to show that!
> 
> Hope the trip down memory lane is useful,
> 
> Brian
> 
>> On Dec 4, 2024, at 12:10âAM, Beyer, Christoph <christoph.beyer@xxxxxxx> wrote:
>> 
>> Hi,
>> 
>> we definetely need the broken slot code asap as we deal endlessly with unkillable job executables. I just planned this morning to wine about it here ;) 
>> 
>> We even more deadly need the max memory usage back into the job-classadds and history - couldn't you just add a new classadd like memory.current and leave the old one as is ? 
>> 
>> Best
>> christoph 
>> 
>> -- 
>> Christoph Beyer
>> DESY Hamburg
>> IT-Department
>> 
>> Notkestr. 85
>> Building 02b, Room 009
>> 22607 Hamburg
>> 
>> phone:+49-(0)40-8998-2317
>> mail: christoph.beyer@xxxxxxx
>> 
>> ----- UrsprÃngliche Mail -----
>> Von: "Greg Thain via HTCondor-users" <htcondor-users@xxxxxxxxxxx>
>> An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
>> CC: "Greg Thain" <gthain@xxxxxxxxxxx>
>> Gesendet: Montag, 2. Dezember 2024 23:59:02
>> Betreff: Re: [HTCondor-users] More thoughts on memory limits
>> 
>> On 12/2/24 10:10 AM, Beyer, Christoph wrote:
>>> Hi,
>>> 
>>> memory.current might be interesting for someone but memory.peak could nonetheless go into another job classadd - not having access to it makes memory management pretty much impossible on many levels ?
>> 
>> 
>> Note that what happens is that HTCondor today polls the memory.current, 
>> and keeps the peak value internally, and reports that peak in the job 
>> ad.  The polling frequency is controllers by the knob 
>> STARTER_UPDATE_INTERVAL.
>> 
>> We are adding support for the notion of a "broken" slot, so that if 
>> there is an unkillable process, the slot will go into the "broken" 
>> state.  When this goes in, I think we can go back to using the 
>> cgroup.peak memory usage and reporting that.
>> 
>> 
>> -greg
>> 
>> 
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> 
>> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/
>> 
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> 
>> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> 
> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> 
> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/