Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] preempt and then hold?
- Date: Fri, 3 Dec 2004 12:37:46 -0600
- From: Erik Paulson <epaulson@xxxxxxxxxxx>
- Subject: Re: [Condor-users] preempt and then hold?
On Fri, Dec 03, 2004 at 11:42:16AM -0600, Scott Koranda wrote:
> Hi,
>
> For a long time we have set up our pool with
>
> PREEMPT = False
>
> so that the nodes in our cluster would not preempt a running
> job for any reason (of course, the negotiator could still
> cause jobs to preempt).
>
> Lately, however, a few users have been running jobs that
> malloc() a lot of memory and then eventually run the machine
> in full swap, which eventually takes them into the weeds.
>
> So we plan to change our configuration to
>
> PREEMPT = (TARGET.ImageSize > ( 512 * 1024))
>
> since each machine has 512 MB of physical memory (yes, the OS
> uses some but we don't mind a little use of swap).
>
> The idea is that when the job's memory usage grows, and Condor
> notices, it will preempt the running job.
>
> Two questions:
>
> 1) Will this work?
>
It should.
> 2) Is there any way to get the preempted job to be placed on
> hold so that the schedd doesn't have to continually process
> through these jobs trying to match them?
>
Maybe - with periodic_hold, you could use something like
periodic_hold = JobRunCount > 5
This isn't perfect, since you don't know why Condor has tried to run
your job 5 times (maybe it's been preempted for priority purposes,
or maybe you've been preempted for reasons other than memory)
-Erik