Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Job requirements not satisfied even when Requirements = TRUE
- Date: Wed, 31 Aug 2011 23:28:58 -0400
- From: "David J. Herzfeld" <herzfeldd@xxxxxxxxx>
- Subject: Re: [Condor-users] Job requirements not satisfied even when Requirements = TRUE
Hi Mark:
On Wed, 2011-08-31 at 20:14 -0700, Mark Cafaro wrote:
> David,
>
> If I understand you correctly, where our issues differ is that this
> occurs intermittently
> for you but occur constantly for us. I can't get a single job to run.
You may be correct, but my hunch is that we are experiencing the same
issue. Our pool has 1,000+ slots, so an intermittent problem with one
node would likely manifest much differently if you had a smaller pool.
> -Mark
>
>
> On Aug 31, 2011, at 8:05 PM, David J. Herzfeld wrote:
>
> > Hi Garrett:
> >
> > On Thu, 2011-09-01 at 02:45 +0000, Koller, Garrett wrote:
> >> Mr. Cafaro,
> >>
> >> I'm confused. I thought the problem was that the job kept being
> >> rejected with the error "Job requirements not satisfied."
> >
> > While I will not speak for Mark, I can speak for the issues that I have
> > encountered (which appears to be at least superficially similar). Yes,
> > the error you quoted is correct. To be clear -
> >
> > This happens after a successful negotiation and match with an available
> > startd (i.e. the job requirements and machine start expression class ads
> > match). The second requirements check, which happens on the execute
> > machine, fails with "Job requirements not satisfied" (the error shows up
> > in the startd log without ever spawning a starter) - this is not a
> > negotiator error, so a condor_q -analyze would not help.
> >
> >> If that is so, how could it be matched in the MatchLog? Was it just
> >> considered in the MatchLog or was it actually assigned to a specific
> >> slot on a specific computer? If the MatchLog says it found a proper
> >> match and actually assigned it to that computer, check out
> >> http://servo.cs.wlu.edu/dokuwiki/doku.php/condor/submit/troubleshoot
> >> for a possible reason and solution to this problem.
> >
> > The machines are matched correctly, but the initial execution of the job
> > executable by the starter never occurs, so I don't believe the
> > information in this page is relevant to this issue. Thanks for the
> > suggestion in any case - this clarification would likely be important to
> > any condor developers looking at this issue.
> >
> > DJH
> >
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/