Re: [HTCondor-devel] Need for dynamic slots in top-level collector?


Date: Mon, 13 Mar 2017 17:14:10 +0000
From: John M Knoeller <johnkn@xxxxxxxxxxx>
Subject: Re: [HTCondor-devel] Need for dynamic slots in top-level collector?
Oh. I see it.  That code was never tested, but it *might* give some gains.   On the other hand. For the versions of glibc that we use  std::strings are copy on reference, so most of those copies aren't really that costly.
 
-----Original Message-----
From: HTCondor-devel [mailto:htcondor-devel-bounces@xxxxxxxxxxx] On Behalf Of John M Knoeller
Sent: Monday, March 13, 2017 11:54 AM
To: Brian Bockelman <bbockelm@xxxxxxxxxxx>; Greg Thain <gthain@xxxxxxxxxxx>
Cc: htcondor-devel@xxxxxxxxxxx
Subject: Re: [HTCondor-devel] Need for dynamic slots in top-level collector?

Which #ifdef are you referring to?

-----Original Message-----
From: HTCondor-devel [mailto:htcondor-devel-bounces@xxxxxxxxxxx] On Behalf Of Brian Bockelman
Sent: Monday, March 13, 2017 10:14 AM
To: Greg Thain <gthain@xxxxxxxxxxx>
Cc: htcondor-devel@xxxxxxxxxxx
Subject: Re: [HTCondor-devel] Need for dynamic slots in top-level collector?


> On Mar 13, 2017, at 9:31 AM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
> 
> Hi Greg,
> 
> I would be OK if the collector never handled a query inline - either doing an EAGAIN or just queuing them up internally for some finite amount of time.
> 
> In this case, the query was inlined because the child processes pushed the collector into swap.  Once swap was involved and memory operations took seconds, well, the whole thing kind of went to hell until the master shot the involved processes.
> 
> Going in a different direction, I tried a "poor man's profiling" (pstack repeatedly) and found lots of places to potentially squeeze out performance.
> 
> - ConvertDefaultIPToSocketIP is called twice for every attribute in the ad.  This does a bunch of string manipulation, but _also_ does at least one param_boolean call per invocation.  Nothing like doing this a few million times per query:
> 
> #0  0x00007f85963e0f18 in next_config_macro () from /usr/lib64/libcondor_utils_8_7_0.so
> #1  0x00007f8596418803 in expand_macro(char const*, macro_set&, macro_eval_context&) () from /usr/lib64/libcondor_utils_8_7_0.so
> #2  0x00007f85964e3317 in param_ctx () from /usr/lib64/libcondor_utils_8_7_0.so
> #3  0x00007f85964e4749 in param () from /usr/lib64/libcondor_utils_8_7_0.so
> #4  0x00007f85964e4a8e in param_boolean(char const*, bool, bool, compat_classad::ClassAd*, compat_classad::ClassAd*, bool) () from /usr/lib64/libcondor_utils_8_7_0.so
> #5  0x00007f85963ea346 in ConvertDefaultIPToSocketIP(char const*, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, Stream&) () from /usr/lib64/libcondor_utils_8_7_0.so
> 
> - GetInternalReferences looks surprisingly expensive.  Since the negotiator typically whitelists attributes for claimed ads, is there room for improvement?
> 
> - While it obviously showed up, I was surprised by how small of a percentage of the time was spent in classad::ClassAdUnParser::Unparse.  Maybe 30%?  50%?
> 
> - prepare_crypto_for_secret_is_noop() involves a modest amount of string parsing, is called once per attribute, yet is a property of the connection itself.  It only needs to be called once per invocation of putClassAd - perhaps can be cached too?
> 
> - The std::string buffered output is released / allocated between each attribute.  It might be reasonable to recycle the allocation on the heap and reserve a modest amount of memory (1KB?)?
> 
> - There's a bit of slop in the way we evaluate string values.  Consider the following traceback:
> 
> #1  0x00007f8595ea6ca9 in classad::Value::CopyFrom(classad::Value const&) () from /usr/lib64/libclassad.so.9
> #2  0x00007f8595e93580 in classad::Literal::_Evaluate(classad::EvalState&, classad::Value&) const () from /usr/lib64/libclassad.so.9
> #3  0x00007f8595e82501 in classad::ExprTree::Evaluate(classad::EvalState&, classad::Value&) const () from /usr/lib64/libclassad.so.9
> #4  0x00007f8595e66ea0 in classad::ClassAd::EvaluateAttr(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, classad::Value&) const () from /usr/lib64/libclassad.so.9
> #5  0x00007f8595e66fe9 in classad::ClassAd::EvaluateAttrString(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&) const () from /usr/lib64/libclassad.so.9
> 
> I think we copy the string twice time: once from the literal to the value, then from the value to the user-provided string (which is reallocated for each attribute).  It would be reasonable to short-circuit this (check to see if the attribute's expression / cached expression is a literal string type, then make a copy directly to the user buffer); it might be possible to avoid it in all cases.  Given that we have const-ness all around, we could take a step further and introduce a EvaluateLiteralString that does zero-copy for strings.
> 

There's another twist on this problem in Unparse:

#0  0x00007f859513d258 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libstdc++.so.6
#1  0x00007f8595ea6ca9 in classad::Value::CopyFrom(classad::Value const&) () from /usr/lib64/libclassad.so.9
#2  0x00007f8595e9d647 in classad::ClassAdUnParser::Unparse(std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, classad::ExprTree const*) () from /usr/lib64/libclassad.so.9
#3  0x00007f85964da5d6 in _putClassAd(Stream*, classad::ClassAd&, int) () from /usr/lib64/libcondor_utils_8_7_0.so
#4  0x00007f85964daa78 in putClassAd(Stream*, classad::ClassAd&, int, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const*) () from /usr/lib64/libcondor_utils_8_7_0.so

We make a copy of the classad::Value (memory allocation, copy the string) in `Unparse`, then make a copy of the string (memory allocation, copy the string), then copy the string to the user-provided buffer (memory allocation, copy the string).  Oh - and finally, we make a copy in putClassAd to the CEDAR socket buffer.

Seems TJ already discovered this when pulling the Factor out of the value, but left the fix #ifdef'd out?

Brian

_______________________________________________
HTCondor-devel mailing list
HTCondor-devel@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel
_______________________________________________
HTCondor-devel mailing list
HTCondor-devel@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel
[← Prev in Thread] Current Thread [Next in Thread→]