At 12:09 PM 8/1/2006, Rick Lan wrote:
Hi, Setting NEGOTIATOR_CONSIDER_PREEMPTION = True seems to work. However, at first jobs would begin to run, then some of the jobs would get stuck as "match but reject the job for unknown reasons" for about 15mins and then start running. Now it is stuck for 2 hours. I've attach SchedLog and NegotiatorLog below. 8/1 22:06:02 Rejected 93.0 malikr@xxxxxxxx <172.26.30.23:3179>: no match found Above line is strange in that previous jobs have identical submit file except file paths.
Obvious question, but you have (had?) "Unclaimed" machines in your pool according to condor_status?
Try doing "condor_status -state" and see how long these Unclaimed machines have been Unclaimed (by looking at the StateTime column). Perhaps these machines are being claimed and run jobs, but then immediately toss the job off? Thus whenever you look, you typically see the machine Unclaimed and the job idle? This could happen if, for example, the stdin file specified does not exist or something like that.
-Todd -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Todd Tannenbaum University of Wisconsin-Madison Condor Project Research Department of Computer Sciences tannenba@xxxxxxxxxxx 1210 W. Dayton St. Rm #4257 http://www.cs.wisc.edu/~tannenba Madison, WI 53706-1685 Phone: (608) 263-7132 FAX: (608) 262-9777