Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Why does machine reject job for unknown reasons
- Date: Tue, 15 May 2007 08:55:08 -0700 (PDT)
- From: "Tony Rippy" <trippy@xxxxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Why does machine reject job for unknown reasons
>> condor_q -better-analyze 1082109.0
>
> 1082109.000: Run analysis summary. Of 152 machines,
> 2 are rejected by your job's requirements
> 0 reject your job because of their own requirements
> 0 match but are serving users with a better priority in the pool
> 150 match but reject the job for unknown reasons
> 0 match but will not currently preempt their existing job
> 0 are available to run your job
Hi Alex,
Based on the better-analyze results above, it looks like the startds are
rejecting the job. I would start by checking the startd policy at your
site. There is more information about this here:
http://www.cs.wisc.edu/condor/manual/v6.8/3_5Startd_Policy.html
One culprit might be a bad Start expression. You can check the Start
expression of your execute nodes by running the following command:
condor_status -startd -format "%s" Machine -format ": Start = %s\n" Start
If the Start expression seems ok, then try looking through the negotiator
log on your central manager. It contains more information about why a
match isn't being made, but you may have to turn on additional logging.
There is more information about logging levels in section 3.3.4 of the
Condor manual:
http://www.cs.wisc.edu/condor/manual/v6.8/3_3Configuration.html
Good luck!
==========================
Tony Rippy
Cycle Computing, LLC
http://www.cyclecomputing.com