Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [Condor-users] Trouble with job priority and job retirement
- Date: Tue, 14 Dec 2004 16:30:13 -0500
- From: "Ian Chesal" <ICHESAL@xxxxxxxxxx>
- Subject: RE: [Condor-users] Trouble with job priority and job retirement
I really think this has to do with the fact that my one user had
received 0 resources from the system during the negoiation cycle. Even
though there were no other users vying for resources here effective user
priority was high and netted here 0 resources so the negotiator ignored
her new job that had higher priority than her old jobs. Does this seem
plausible?
- Ian
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
> Sent: December 14, 2004 12:49 PM
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] Trouble with job priority and job
> retirement
>
>
> I cannot reproduce any problems with a match record not
> getting deleted when a claim timeout happens. If you are
> still having a problem, please send the relevant StartLog,
> NegotiatorLog, and SchedLog to condor-admin and I'll try to
> see what is going on.
>
> --Dan
>
> Dan Bradley wrote:
>
> > Ian,
> >
> > In a case such as the one you describe, where job 2.0
> preempts job 1.0
> > and has to wait around for 1.0 to finish, there are two possible
> > cases. One is that 1.0 finishes and 2.0 claims the machine. The
> > other is that the schedd times out waiting for 2.0 to get an active
> > claim (controlled by REQUEST_CLAIM_TIMEOUT), and it tries getting a
> > new match for 2.0. From your description of what is
> happening, I am
> > concerned that when the timeout happens, the previous match is not
> > getting correctly removed. I will double-check this case
> and get back
> > to you. If you set REQUEST_CLAIM_TIMEOUT to a very large
> number, you
> > should be able to remove this case from even being a possibility.
> >
> > You also asked about the meaning of, "Over submitter resource limit
> > (0) ... only consider startd ranks." This means that when Condor
> > sliced up the resource pie between job submittors, this user got a
> > slice of size 0.
> >
> > --Dan
> >
> > Ian Chesal wrote:
> >
> >> I'm trying to get a better handle on job retirement. I'm
> observing a
> >> strange situation in our current 6.7.2 system which uses the
> >> retirement feature. We have a fairly long retirement time set (2
> >> days). I have a user that has 100 jobs queued as cluster
> 1. 2 of the
> >> jobs are running on the available resources. She queues up a 101th
> >> job at a higher priority than the previously 100 queued
> jobs as cluster 2.
> >>
> >> The negotiator log at time t indicates that is has matched her 2.0
> >> job and is preempting job 1.0 running on machine-A. At negotiation
> >> cycle t+1 later job 1.1 finishes running on machine-B. Rather than
> >> assign the high priority job, 2.0, to the now free machine-B at
> >> negotiation cycle t+2 I'm seeing a lower priority job,
> 1.11, get assigned to the machine.
> >>
> >> My question is this: once a job is moved to retirement on
> behalf of a
> >> queued, higher priority job, is that waiting job bound to
> be assigned
> >> to that particular machine? Can it not use the next available
> >> resource? I get the feeling that the job is exempted from future
> >> negotiation cycles because once I see a message saying job 1.0 is
> >> being preempted for job 2.0 I don't see any more
> negotiator messages
> >> for job 2.0 in subsequent negotiation cycles. Is there a point in
> >> time when the 2.0 job will give up waiting for the 1.0 job
> to retire and be renegotiated?
> >>
> >> I am also seeing this very odd message in my NegotiatorLog
> printed at
> >> the start of her portion of the negotiation cycle:
> >>
> >> 12/13 16:00:02 Over submitter resource limit (0) ...
> only consider
> >> startd ranks
> >>
> >> This is printed for the user "bchan" who is experiencing the
> >> inability to get her higher priority job running before
> her lower priority jobs.
> >> What does this message mean? I couldn't find an answer
> searching the
> >> archives unfortunatly, although I did notice this question
> has been
> >> asked a few times.
> >>
> >> Myself and another user tested that priority works, and for us it
> >> wasn't a problem. But in the NegotiatorLog file there were
> no "Over submitter"
> >> messages for our sections of the negotiation cycle. I suspect her
> >> problems relate to this message.
> >>
> >> Thanks!
> >>
> >> - Ian Chesal
> >>
> >>
> >>
> >>
> >> --
> >> Ian R. Chesal <ichesal@xxxxxxxxxx>
> >> Senior Software Engineer
> >>
> >> Altera Corporation
> >> Toronto Technology Center
> >> Tel: (416) 926-8300
> >>
> >>
> >> _______________________________________________
> >> Condor-users mailing list
> >> Condor-users@xxxxxxxxxxx
> >> http://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >>
> >>
> >
> > _______________________________________________
> > Condor-users mailing list
> > Condor-users@xxxxxxxxxxx
> > http://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> http://lists.cs.wisc.edu/mailman/listinfo/condor-users
>