[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] memory leak in condor_q?



Hi Daniel,

The 8.1.4 release is currently in testing. It is slated for release on February 27th. Our upcoming release plans can be found at https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=DeveloperReleasePlan

...Tim

On 02/07/2014 01:58 AM, Pek Daniel wrote:
Thank you Todd! The reason why we need to measure condor_q -global
performance is because our users use it extensively nowadays (more
precisely the counterpart of it in the current system). They use it to
poll their jobs, grepping around, etc. Even worse, we can't/don't want
to tie a specific user to a specific submission node (achieving more
efficient load balancing between schedds), which means if a user wants
to find its job, s/he has to query with -global... Also, dedicated
schedd nodes won't be reachable by our users through ssh or so, every
submission will be a -remote (or -name) submission.
In case of Condor, I'm aware there are much more efficient ways to do
this polling (like checking the joblog), which won't affect the
service that much, but we have to keep in mind: bad habits die hard,
so in the "transitional" phase sure there will be some users who will
"stick to" the global query way to poll their jobs, so we have to be
prepared.
I had a look at
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=ReleaseHistory, but
is there any rough estimation about the time of the next dev release
(8.1.4)?

Thanks,
Daniel

2014-02-06 Todd Tannenbaum <tannenba@xxxxxxxxxxx>:
Indeed, as Brian says, the patches going into the code deal with the below
condor_q -stream issue.  Note that an often over-looked downside of
"condor_q -stream" is that the results will not be sorted in any manner, as
normally the schedd sends the job ads out in hash table order and condor_q
does the sorting.

Another often over-looked item is using condor_status with either "-schedd"
or "-submitter" instead of polling condor_q in order to get "big picture"
aggregate statistics like total jobs running/idle per schedd or user,
respectively.  No need to use condor_q and slowly get a dump of all 500,000
jobs and then count to get this sort of aggregate info, condor_status is
much faster and less resource intensive.  Yes, condor_status is giving you
"cached" information that may only be updated once or twice a minute, but in
many situations (like portals that want to update a web page or what have
you) that is just fine...

regards,
Todd


On 2/6/2014 10:11 AM, Pek Daniel wrote:
Yaaay, perfect! :)

2014-02-06 Brian Bockelman <bbockelm@xxxxxxxxxxx>:
Hi Daniel,

This is expected.  Stream does not stream (well, until my patches land).
It still buffers the entire response in memory before parsing it.

Stream prevents HTCondor from sorting the results in memory.

Non-blocking condor_q patch set will take care of this and turn it into a
real stream.

Sent from my iPhone

On Feb 6, 2014, at 9:46 AM, Pek Daniel <pekdaniel@xxxxxxxxx> wrote:

I've noticed there's not so much difference between the memory
consumption of condor_q -stream and condor_q:

[root@XXX thrash]# /usr/bin/time -v condor_q -stream >/dev/null
...
Maximum resident set size (kbytes): 186592
...

[root@XXX thrash]# /usr/bin/time -v condor_q >/dev/null
...
Maximum resident set size (kbytes): 306352
...

There was 100k jobs in the queue. I've dug into the source a bit, and
I suspect some leak somewhere here:

https://github.com/htcondor/htcondor/blob/b151357dcd13efe2703a2386e1d89bbacac79cd6/src/condor_schedd.V6/qmgmt_send_stubs.cpp#L862-L882

or here:

https://github.com/htcondor/htcondor/blob/0222c71b4a7cf5946ab9d5caf5ecca0ca8c75539/src/condor_utils/classad_oldnew.cpp#L57-L130

Maybe the ReliSock, or the ClassAd...

Cheers,
daniel
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

--
Tim Theisen
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736