Could also be that the machine "bridges" is in trouble and the jobs are starting there, dying, getting disconnected, etc. (it takes a while for the schedd to give up on a disconnected job and it will still show in condor_q -run, even though it is no longer on the machine). If so, StarterLog.* on "bridges" and/or ShadowLog on your schedd will tell the sory.
Steve
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of John M Knoeller <johnkn@xxxxxxxxxxx>
Sent: Tuesday, October 17, 2017 11:59:06 AM To: HTCondor-Users Mail List Subject: Re: [HTCondor-users] incorrect values returned by condor_status The most likely reason I could see for this would be the schedd and collector update at different times when a new job starts. If there are machines starting new jobs frequently, then the responses you get from the Schedd will be a bit
ahead of what you see in the Collector. I would be interested in knowing if there are specific slots that consistently show up in the schedd but not in the collector. try correlating the output of condor_status -const ‘regexp(“bridges”,Machine)’ -af Name And the output of condor_q -all -const ‘regexp(“bridges”,RemoteHost)’ -af RemoteHost
Are there slots that show up in the condor_q output but not the condor_status output? If so, are these slots newly created? or have they been around for a while? If the collector is dropping updates, then you *might* see fewer slots in the collector than you see in the schedd – but you would also expect to see that reflected in the collector statistics.
you could try running condor_status -collector -long | grep UpdatesLost Is the collector showing dropped updates? -tj From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx]
On Behalf Of Krieger, Donald N. Dear list, I have a bunch of glideins running on a machine called bridges
. I poll the number of jobs slots using: condor_status -const ‘regexp(“bridges”,Machine)’ The numbers I get are considerably lower than those I get with condor_q -nobatch | grep -c bridges The numbers given by the condor_q command are consistent with the number of glideins x job slots that I can see running. The numbers given by the condor_q command are consistent with the number of condor_shadows I see in the process list.. Any thoughts about this? Is there a problem with the condor_status command as I’m using it? Is there an error I could have made in configuring the glideins so their jobs slots are not being counted by condor_status?
Thanks - Don |