[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] More thoughts on memory limits



Hi,

we definetely need the broken slot code asap as we deal endlessly with unkillable job executables. I just planned this morning to wine about it here ;) 

We even more deadly need the max memory usage back into the job-classadds and history - couldn't you just add a new classadd like memory.current and leave the old one as is ? 

Best
christoph 

-- 
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx

----- UrsprÃngliche Mail -----
Von: "Greg Thain via HTCondor-users" <htcondor-users@xxxxxxxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
CC: "Greg Thain" <gthain@xxxxxxxxxxx>
Gesendet: Montag, 2. Dezember 2024 23:59:02
Betreff: Re: [HTCondor-users] More thoughts on memory limits

On 12/2/24 10:10 AM, Beyer, Christoph wrote:
> Hi,
>
>   memory.current might be interesting for someone but memory.peak could nonetheless go into another job classadd - not having access to it makes memory management pretty much impossible on many levels ?


Note that what happens is that HTCondor today polls the memory.current, 
and keeps the peak value internally, and reports that peak in the job 
ad. The polling frequency is controllers by the knob 
STARTER_UPDATE_INTERVAL.

We are adding support for the notion of a "broken" slot, so that if 
there is an unkillable process, the slot will go into the "broken" 
state. When this goes in, I think we can go back to using the 
cgroup.peak memory usage and reporting that.


-greg


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/