HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] re-thinking definition of TotalRunningJobs



Idle jobs are indeed counted separately, but obey very similar logic where they (a) are not counted for scheduler or local universe jobs, and (b) are being incremented based on difference between max hosts and current hosts, which can cause some transient mis-counting in situations that Matt identified where job has just been matched, or just removed, and value of current hosts hasn't caught up yet.  Very possibly that constitutes an argument for making analogous redefinition of Idle Jobs, as simple count of jobs whose status is (ta-da) IDLE.

My impression is that the current less-intuitive counting logic was adopted deliberately as a sidelong way to obtain universe-specific policies on job limits.  My straw proposal is that counts of running jobs and idle jobs should be re-simplified, and if that torpedos any desired policies for universe specific job limits, then that should be (re)addressed separately, for example as a response to gt#642.   Since my experience with condor encompasses all of 3 weeks, I'm supremely open to broader perspectives :-)
Erik


On Thu, 2010-03-11 at 11:44 -0600, Dan Bradley wrote:
It seems reasonable that TotalRunningJobs should be the total running 
jobs ;-)

I see that the negotiator depends on IdleJobs>0 (not TotalIdleJobs) in 
the submitter ad to decide whether it is worth negotiating with a 
submitter.  So we would just want to make sure that only jobs requiring 
negotiation are counted in the IdleJobs counter.  Maybe that's not 
connected to the counter for TotalIdleJobs, but you'd want to make sure 
before changing it.

--Dan

Erik Erlandson wrote:
> (cribbed from earlier irc post to #distcomp)
>
> Is anybody aware of potential negative impacts of simplifying the 
> criteria for counting 'TotalRunningJobs' (aka scheduler.JobsRunning) 
> to something like "just count all jobs with status = {RUNNING | IDLE | 
> UNEXPANDED}"?  It pertains to addressing gt#602, and also gt#642 and 
> gt#334.  One impact would be more jobs of certain kinds would be 
> counted toward max-jobs limit.  Which may be why current counting 
> exceptions were put in place to begin with.
>
> http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=602
> http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=334
> http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=642
>
> -Erik
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Condor-devel mailing list
> Condor-devel@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-devel