> The problem I'm trying to solve in the current production
> environment is that User-A would submit thousands of vanilla
> jobs (one or more clusters).
> Runtime for each job is typically under 1 hour. User-B
> submits a few jobs and never gets access to any machines. The
> greater insult is that User-A then submits more jobs and they
> run before the User-B jobs.
> I don't necessarily want preemption to
> kill/checkpoint/restart the User-A jobs, just to insert a
> wedge so User-B can get access to some resources within a
> reasonable period of time. I stumbled on MaxJobRetirementTime
> from reading this mailing list - not finding it in the
> version 6.6.7 manual, began exploring 6.7.3. It does EXACTLY
> what I need - simple, clean, and straight-forward when used
> with a simple PREEMPTION_REQUIREMENTS expression based on
> priority, and a shorter (than 1 day) PRIORITY_HALFLIFE.
Doak,
I'd be very interested in hearing how you're changing
PREEMPTION_REQUIREMENTS. I'm doing something very similar and have run
into a few issues. See the threads with subjects "Default User priority
factor", "When do machine RANK settings apply?" and "about scheduling
algorithm in condor and condor-g" for some background on what I am
trying to do. Does that look familiar?
Cheers!
- Ian C.
|