[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] accurate way to query the number of jobs globally



On Fri, Feb 28, 2014 at 03:45:58PM +0100, Pek Daniel wrote:
> Thanks guys!
> 
> Steffen: if you have 10 million jobs, then condor_q have to process
> and stream out to the pipe 10M * 80 (columns per line) = 800MByte of
> data, and after wc -l have to read the whole thing just to get a
> single number in the end. It sounds a bit of overshooting. But it's
> possible that it's the best I can do....

That's where the -constraint enters the stage. (If you actually have
10 million slots in your pool, I'd like to see the central manager
machine...)
And specifying a username would reduce the number of lies further.
Speaking of stale information: the time it takes condor_q to run may
be long enough to give you inaccurate counts as well. TBH I'm not
aware of any *precise* tool, and I never needed one.

> Tim: that's what you have on condor_status -schedd output is most of
> the time stale data. For example when I remove all my jobs from the
> cluster, it still shows wrong numbers for a while (minutes), didn't
> get updated instantly, even if condor_q -global already shows empty
> queue.

AFAICT condor_status -schedd won't tell you about users - it's no "one
size fits all" (even without that delay) - but it provides a nice
summary of available submit machines almost forgotten :)


- S