Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [Condor-users] When do machine RANK settings apply?
- Date: Wed, 5 Jan 2005 15:57:40 -0500
- From: "Ian Chesal" <ICHESAL@xxxxxxxxxx>
- Subject: RE: [Condor-users] When do machine RANK settings apply?
> > Looking at the NegotiatorLog it wants to preempt bchan's
> jobs for mine
> > but it can't because PREEMPTION_REQUIREMENTS are false. I
> think what
> > I'm observing here is that bchan's schedd holding on to the startd
> > machine after a job finishes and just running the next job in her
> > list. Why is my higher-ranking job not taking over this machine?
>
> That is an issue - essentially if a user retains a claim to
> the machine then they can keep sending lower priority jobs
> too it. It seems the negotiator <annoyingly> decides that it
> tried checking preemption based on the user priority being
> higher, that said no so it won't bother checking if the
> machine rank makes a difference...
I think, for my vanilla jobs in conjuntion with my very long job
retirement time then, I should be safe and perhaps better off saying:
PREEMPTION_REQUIREMENTS = (CurrentTime - EnteredCurrentState) > (1 * (60
* 60)) && MYRANK < TARGET.RANK
> Just to check if you release all those jobs at the same time
> (with only 2 machines to execute the three of them) so that
> a single negotiation cycle happens does the right allocation occur?
I'll have to test this. I'll need to get my two other users to submit
some dummy jobs.
> I was aware of the problem you describe on 6.6 (I very
> occasionally have to execute a condor_vacate to force things
> to realign if two users have identically tiered jobs but one
> got a 'head start' and therefore holding onto it) but the 6.7
> retirement in theory should have allowed me to enable user
> preemption where a slight disparity exists coupled with max
> job retirement to avoid thrashing.
Right. This is what's got me thinking that I'm better off allowing
preemption based on ranks using PREEMPTION_REQUIREMENTS. I wont thrash
because of MaxJobRetirementTime. Although, I've since tweaked my setting
back to allowing preemption and bchan has still got a firm hold on that
startd.
> All is not lost though - I think you may have forgotten about
> your 2 day retirement time... the negotiator does recheck
> when a "premption pending retirement" exists in case the
> premting job goes away, this lets the retirement be withdrawn.
>
> If the retirement is present but the schedd is still
> accepting jobs then thats a BUG (didn't someone else mention
> this a while back, did it get identified/resolved)...
I'm not sure what this means. Is there a way I can check for this?
> Any one at cs.wisc can see a why this might be happening
> please do chip in here but I'm hitting a brick wall now.
>
> Clearly more than one group would like to use condor in a
> "Job then User" setting condor, for all it's vaunted
> flexibility does not make this easy (jury still out on
> possible) allow this. I see the reasons it doesn't since
> considerable optimization of the startd/negotiator comms
> overhead can be performed this way.
> However these optimizations make what we are attempting to do
> excruciatingly unpleasant
Agreed. I'll talk a slow down and some gross inefficiency in the
negotiator to get to where I want to be today: job priority based
scheduling and not user based scheduling.
Ian