Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Preempt jobs which exceed their request_memory - but no parallel universe?
- Date: Wed, 4 Mar 2015 09:37:04 +0100
- From: Steffen Grunewald <Steffen.Grunewald@xxxxxxxxxx>
- Subject: Re: [HTCondor-users] Preempt jobs which exceed their request_memory - but no parallel universe?
On Tue, Mar 03, 2015 at 10:36:30AM -0600, Greg Thain wrote:
> On 03/03/2015 05:31 AM, Steffen Grunewald wrote:
> >I'm confused.
> >
> >I have a couple of users who underestimate the memory their jobs
> >would attempt to allocate, and as a result some worker nodes end
> >up swapping heavily.
> >I tried to get those jobs preempted, and sent back into the queue
> >with their updated (ImageSize) request_memory:
> >
> ># Let job use its declared amount of memory and some more
> >MEMORY_EXTRA = 2048
> >MEMORY_ALLOWED = (Memory + $(MEMORY_EXTRA)*Cpus)
> ># Get the current footprint
> >MEMORY_CURRENT = (ImageSize/1024)
> ># Exceeds expectations?
> >MEMORY_EXCEEDED = $(MEMORY_CURRENT) > $(MEMORY_ALLOWED)
> ># If exceeding, preempt
> >#[preset]PREEMPT = False
> >PREEMPT = ($(PREEMPT)) || ($(MEMORY_EXCEEDED))
> >WANT_SUSPEND = False
> >
> >
> This should all work. Can you wrap your PREEMPT expression in the
> debug() function like this:
>
> PREEMPT = debug($(PREEMPT) || ($(MEMORY_EXCEEDED)))
This will require some DEBUG settings as well, right? (and disk space)
> What are WANT_VACATE and KILL set to? If you don't want to give
> these jobs a grace period, you
> probably want WANT_VACATE = false.
Certainly (that's been the policy for years):
$ condor_config_val -dump | grep -i WANT_
WANT_SUSPEND = False
WANT_UDP_COMMAND_SOCKET = true
WANT_VACATE = False
WANT_XML_LOG = false
$ condor_config_val -dump | grep -i KILL
KILL = False
KILLING_TIMEOUT = 30
VM_KILLING_TIMEOUT = 60
WINDOWS_SOFTKILL =
For the "exclude parallel universe from preemption" part, I will now use
PREEMPT = ($(PREEMPT)) || ($(MEMORY_EXCEEDED) && (JobUniverse =!= 11))
(and I'm afraid "PREEMPT_VANILLA = False" was the cause for preemption not
happening to vanilla universe jobs... removed that one from the config now)
Let's see what happens...
Thanks,
Steffen