[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] preempt and then hold?



On Fri, Dec 03, 2004 at 11:42:16AM -0600, Scott Koranda wrote:
> Hi,
> 
> For a long time we have set up our pool with 
> 
> PREEMPT = False
> 
> so that the nodes in our cluster would not preempt a running
> job for any reason (of course, the negotiator could still
> cause jobs to preempt).
> 
> Lately, however, a few users have been running jobs that
> malloc() a lot of memory and then eventually run the machine
> in full swap, which eventually takes them into the weeds.
> 
> So we plan to change our configuration to
> 
> PREEMPT = (TARGET.ImageSize > ( 512 * 1024))
> 
> since each machine has 512 MB of physical memory (yes, the OS
> uses some but we don't mind a little use of swap).
> 
> The idea is that when the job's memory usage grows, and Condor
> notices, it will preempt the running job.
> 
> Two questions:
> 
> 1) Will this work?
> 

It should.

> 2) Is there any way to get the preempted job to be placed on
> hold so that the schedd doesn't have to continually process
> through these jobs trying to match them? 
> 

Maybe - with periodic_hold, you could use something like

periodic_hold = JobRunCount > 5

This isn't perfect, since you don't know why Condor has tried to run
your job 5 times (maybe it's been preempted for priority purposes,
or maybe you've been preempted for reasons other than memory)

-Erik