[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] incorrect values returned by condor_status



Thanks for getting back on this John.

I’ll report back.

 

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of John M Knoeller
Sent: Tuesday, October 17, 2017 12:59 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] incorrect values returned by condor_status

 

The most likely reason I could see for this would be the schedd and collector update at different times when a new job starts.  If there are machines starting new jobs frequently, then the responses you get from the Schedd will be a bit ahead of what you see in the Collector.

 

I would be interested in knowing if there are specific slots that consistently show up in the schedd but not in the collector.   try correlating the output of

 

  condor_status -const ‘regexp(“bridges”,Machine)’ -af Name

 

And the output of

 

  condor_q -all -const ‘regexp(“bridges”,RemoteHost)’ -af RemoteHost

 

Are there slots that show up in the condor_q output but not the condor_status output?  If so, are these slots newly created? or have they been around for a while?

 

If the collector is dropping updates, then you *might* see fewer slots in the collector than you see in the schedd – but you would also expect to see that reflected in the collector statistics.

 

you could try running

 

condor_status -collector -long | grep UpdatesLost

 

Is the collector showing dropped updates?

 

-tj

 

 

 

 

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Krieger, Donald N.
Sent: Tuesday, October 17, 2017 7:11 AM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] incorrect values returned by condor_status

 

Dear list,

 

I have a bunch of glideins running on a machine called bridges .

I poll the number of jobs slots using: condor_status -const ‘regexp(“bridges”,Machine)’

The numbers I get are considerably lower than those I get with condor_q -nobatch | grep -c bridges

The numbers given by the condor_q command are consistent with the number of glideins x job slots that I can see running.

The numbers given by the condor_q command are consistent with the number of condor_shadows I see in the process list..

 

Any thoughts about this?

Is there a problem with the condor_status command as I’m using it?

Is there an error I could have made in configuring the glideins so their jobs slots are not being counted by condor_status?

 

Thanks - Don