[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor's calculated memory vs image size of jobs in queue

Date: Wed, 16 May 2007 12:13:53 -0700
From: Stuart Anderson <anderson@xxxxxxxxxxxxxxxx>
Subject: Re: [Condor-users] condor's calculated memory vs image size of jobs in queue

Paul,
	This looks like it might be a problem with the automatic job clustering.
As a test you might try enabling NEGOTIATE_ALL_JOBS_IN_CLUSTER to see if that
solves the problem before digging deeper.

Thanks.

On Wed, May 16, 2007 at 11:55:44AM -0500, Paul Armor wrote:
> 
> I'm noticing an interesting edge case in our pool, where a user has lots 
> of jobs queued up... some may get evicted after some amount of run time, 
> fail to match when they try to pick up where they left off after a 
> checkpoint/eviction as their SIZE had grown to larger than the "Memory" 
> value determined on start up on the compute node.  When that job has the 
> lowest job id for that user in the queue, schedd will just spin from that 
> point on, only trying to schedule that job, and no others...
> 

-- 
Stuart Anderson  anderson@xxxxxxxxxxxxxxxx
http://www.ligo.caltech.edu/~anderson

Follow-Ups:
- Re: [Condor-users] condor's calculated memory vs image size of jobs in queue
  - From: Steffen Grunewald

References:
- [Condor-users] how to ask an execute machine "stop after this job" ?
  - From: Nicolas GUIOT
- Re: [Condor-users] how to ask an execute machine "stop after this job" ?
  - From: Matt Hope
- [Condor-users] condor's calculated memory vs image size of jobs in queue
  - From: Paul Armor

Prev by Date: [Condor-users] condor's calculated memory vs image size of jobs in queue
Next by Date: Re: [Condor-users] win XP nodes/linux manager
Previous by thread: [Condor-users] condor's calculated memory vs image size of jobs in queue
Next by thread: Re: [Condor-users] condor's calculated memory vs image size of jobs in queue
Index(es):
- Date
- Thread