HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Jobs lying idle when machines are available.



Thanks for the reply Todd. I will collect the logs and email you guys with
all the logs.

Regards
Mahadev

> -----Original Message-----
> From: Todd Tannenbaum [mailto:tannenba@xxxxxxxxxxx]
> Sent: Thursday, November 09, 2006 8:37 AM
> To: Mahadev Konar; condor-devel@xxxxxxxxxxx
> Subject: Re: [Condor-devel] Jobs lying idle when machines are available.
> 
> 
> Hi Mahadev -
> 
> With D_FULLDEBUG specified for both the schedd and the negotiator,
> could you please send along more of the NegotiatorLog when this
> problem happens? Such as the entire section of the NegotiatorLog for
> that negotiation cycle?  Also, if you could include the section of
> the ScheddLog during the time of the negotation cycle, that'd be great.
> 
> Armed with the above info, hopefully we can gain some insight re what
> is happening at your site.
> 
> Maybe sending to condor-admin with the log files would be a good idea
> (both because they could be long and they could contain IP addresses
> etc you don't want passed around on a public email list) -  we could
> summarize what we find back to this list if folks are interested.
> 
> thanks,
> Todd
> 
> 
> 
> At 04:57 PM 11/8/2006, Mahadev Konar wrote:
> >Hi all,
> >   While experimenting with Condor on our cluster, we encountered the
> >following problem.
> >
> >After we submit a bunch of jobs, Condor is able to schedule a few of them
> >but for the others we get:
> >
> >100 match but reject the job for unknown reasons.
> >
> >There are machines lying idle on the cluster not running anything but
> Condor
> >does not schedule jobs on them. Also, all the jobs are similar in nature
> and
> >so are the machines. Meaning that if one job matches a  machine so should
> >the other.
> >Also, this happens every 2nd or 3rd time we submit a bunch of jobs to the
> >cluster. Sometimes it does run all the jobs in the queue.
> >
> >After taking a look at the Negotiator logs I saw, this
> >
> >Attempting to use cached MatchList: Failed (MatchList length: 0,
> >Autocluster: 0, Schedd Name: ****, Schedd Address: *****)
> >11/8 22:40:43       Rejected jobid schedd_name schedd_ip : no match found
> >
> >The above is for the jobs that condor_q -analyze says " match but reject
> the
> >job for unknown reasons".
> >
> >Could this be possible due to some misconfiguration?
> >Is there anyway to debug why this would be happening? Any tools to find
> out
> >why a job is not matching a specific startd? It would be great if someone
> >could point me to some debugging tools for this.
> >
> >Thanks
> >Mahadev
> >
> >_______________________________________________
> >Condor-devel mailing list
> >Condor-devel@xxxxxxxxxxx
> >https://lists.cs.wisc.edu/mailman/listinfo/condor-devel
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Todd Tannenbaum                       University of Wisconsin-Madison
> Condor Project Research               Department of Computer Sciences
> tannenba@xxxxxxxxxxx                  1210 W. Dayton St. Rm #4257
> http://www.cs.wisc.edu/~tannenba      Madison, WI 53706-1685
> Phone: (608) 263-7132  FAX: (608) 262-9777