[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] More thoughts on memory limits



Hi Jeff,

I would assume it depends on how often you run the system-periodic job. It will during each run compare the memory consumption of the very moment with the max you defined. 

Anything that happens inbetween will go unnoticed I assume ... 

Best
christoph

-- 
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx

----- UrsprÃngliche Mail -----
Von: "Jeff Templon" <templon@xxxxxxxxx>
An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
Gesendet: Mittwoch, 4. Dezember 2024 11:18:12
Betreff: Re: [HTCondor-users] More thoughts on memory limits

Hi,

Iâm wondering if this change has effectively disabled the memory limits.  I donât see any more held jobs with messages about memory exceeded, I do see job restarts with messages about âexhausted memory on worker nodeâ.

JT


> On 4 Dec 2024, at 07:10, Beyer, Christoph <christoph.beyer@xxxxxxx> wrote:
> 
> Hi,
> 
> we definetely need the broken slot code asap as we deal endlessly with unkillable job executables. I just planned this morning to wine about it here ;) 
> 
> We even more deadly need the max memory usage back into the job-classadds and history - couldn't you just add a new classadd like memory.current and leave the old one as is ? 
> 
> Best
> christoph 
> 
> -- 
> Christoph Beyer
> DESY Hamburg
> IT-Department
> 
> Notkestr. 85
> Building 02b, Room 009
> 22607 Hamburg
> 
> phone:+49-(0)40-8998-2317
> mail: christoph.beyer@xxxxxxx
> 
> ----- UrsprÃngliche Mail -----
> Von: "Greg Thain via HTCondor-users" <htcondor-users@xxxxxxxxxxx>
> An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
> CC: "Greg Thain" <gthain@xxxxxxxxxxx>
> Gesendet: Montag, 2. Dezember 2024 23:59:02
> Betreff: Re: [HTCondor-users] More thoughts on memory limits
> 
> On 12/2/24 10:10 AM, Beyer, Christoph wrote:
>> Hi,
>> 
>>  memory.current might be interesting for someone but memory.peak could nonetheless go into another job classadd - not having access to it makes memory management pretty much impossible on many levels ?
> 
> 
> Note that what happens is that HTCondor today polls the memory.current, 
> and keeps the peak value internally, and reports that peak in the job 
> ad.  The polling frequency is controllers by the knob 
> STARTER_UPDATE_INTERVAL.
> 
> We are adding support for the notion of a "broken" slot, so that if 
> there is an unkillable process, the slot will go into the "broken" 
> state.  When this goes in, I think we can go back to using the 
> cgroup.peak memory usage and reporting that.
> 
> 
> -greg
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> 
> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> 
> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/