Re: [HTCondor-devel] Need for dynamic slots in top-level collector?


Date: Mon, 13 Mar 2017 09:31:12 -0500
From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
Subject: Re: [HTCondor-devel] Need for dynamic slots in top-level collector?
Hi Greg,

I would be OK if the collector never handled a query inline - either doing an EAGAIN or just queuing them up internally for some finite amount of time.

In this case, the query was inlined because the child processes pushed the collector into swap.  Once swap was involved and memory operations took seconds, well, the whole thing kind of went to hell until the master shot the involved processes.

Going in a different direction, I tried a "poor man's profiling" (pstack repeatedly) and found lots of places to potentially squeeze out performance.

- ConvertDefaultIPToSocketIP is called twice for every attribute in the ad.  This does a bunch of string manipulation, but _also_ does at least one param_boolean call per invocation.  Nothing like doing this a few million times per query:

#0  0x00007f85963e0f18 in next_config_macro () from /usr/lib64/libcondor_utils_8_7_0.so
#1  0x00007f8596418803 in expand_macro(char const*, macro_set&, macro_eval_context&) () from /usr/lib64/libcondor_utils_8_7_0.so
#2  0x00007f85964e3317 in param_ctx () from /usr/lib64/libcondor_utils_8_7_0.so
#3  0x00007f85964e4749 in param () from /usr/lib64/libcondor_utils_8_7_0.so
#4  0x00007f85964e4a8e in param_boolean(char const*, bool, bool, compat_classad::ClassAd*, compat_classad::ClassAd*, bool) () from /usr/lib64/libcondor_utils_8_7_0.so
#5  0x00007f85963ea346 in ConvertDefaultIPToSocketIP(char const*, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, Stream&) () from /usr/lib64/libcondor_utils_8_7_0.so

- GetInternalReferences looks surprisingly expensive.  Since the negotiator typically whitelists attributes for claimed ads, is there room for improvement?

- While it obviously showed up, I was surprised by how small of a percentage of the time was spent in classad::ClassAdUnParser::Unparse.  Maybe 30%?  50%?

- prepare_crypto_for_secret_is_noop() involves a modest amount of string parsing, is called once per attribute, yet is a property of the connection itself.  It only needs to be called once per invocation of putClassAd - perhaps can be cached too?

- The std::string buffered output is released / allocated between each attribute.  It might be reasonable to recycle the allocation on the heap and reserve a modest amount of memory (1KB?)?

- There's a bit of slop in the way we evaluate string values.  Consider the following traceback:

#1  0x00007f8595ea6ca9 in classad::Value::CopyFrom(classad::Value const&) () from /usr/lib64/libclassad.so.9
#2  0x00007f8595e93580 in classad::Literal::_Evaluate(classad::EvalState&, classad::Value&) const () from /usr/lib64/libclassad.so.9
#3  0x00007f8595e82501 in classad::ExprTree::Evaluate(classad::EvalState&, classad::Value&) const () from /usr/lib64/libclassad.so.9
#4  0x00007f8595e66ea0 in classad::ClassAd::EvaluateAttr(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, classad::Value&) const () from /usr/lib64/libclassad.so.9
#5  0x00007f8595e66fe9 in classad::ClassAd::EvaluateAttrString(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&) const () from /usr/lib64/libclassad.so.9

I think we copy the string twice time: once from the literal to the value, then from the value to the user-provided string (which is reallocated for each attribute).  It would be reasonable to short-circuit this (check to see if the attribute's expression / cached expression is a literal string type, then make a copy directly to the user buffer); it might be possible to avoid it in all cases.  Given that we have const-ness all around, we could take a step further and introduce a EvaluateLiteralString that does zero-copy for strings.

There remains room for improvement...

Brian

#4  0x00007f8595e67e9e in classad::ClassAd::_GetInternalReferences(classad::ExprTree const*, classad::ClassAd const*, classad::EvalState&, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) const () from /usr/lib64/libclassad.so.9
#5  0x00007f8595e67af6 in classad::ClassAd::_GetInternalReferences(classad::ExprTree const*, classad::ClassAd const*, classad::EvalState&, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) const () from /usr/lib64/libclassad.so.9
#6  0x00007f8595e67af6 in classad::ClassAd::_GetInternalReferences(classad::ExprTree const*, classad::ClassAd const*, classad::EvalState&, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) const () from /usr/lib64/libclassad.so.9
#7  0x00007f8595e67b22 in classad::ClassAd::_GetInternalReferences(classad::ExprTree const*, classad::ClassAd const*, classad::EvalState&, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) const () from /usr/lib64/libclassad.so.9
#8  0x00007f8595e67af6 in classad::ClassAd::_GetInternalReferences(classad::ExprTree const*, classad::ClassAd const*, classad::EvalState&, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) const () from /usr/lib64/libclassad.so.9
#9  0x00007f8595e67b22 in classad::ClassAd::_GetInternalReferences(classad::ExprTree const*, classad::ClassAd const*, classad::EvalState&, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) const () from /usr/lib64/libclassad.so.9
#10 0x00007f8595e67af6 in classad::ClassAd::_GetInternalReferences(classad::ExprTree const*, classad::ClassAd const*, classad::EvalState&, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) const () from /usr/lib64/libclassad.so.9
#11 0x00007f8595e67b22 in classad::ClassAd::_GetInternalReferences(classad::ExprTree const*, classad::ClassAd const*, classad::EvalState&, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) const () from /usr/lib64/libclassad.so.9
#12 0x00007f8595e67af6 in classad::ClassAd::_GetInternalReferences(classad::ExprTree const*, classad::ClassAd const*, classad::EvalState&, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) const () from /usr/lib64/libclassad.so.9
#13 0x00007f8595e67af6 in classad::ClassAd::_GetInternalReferences(classad::ExprTree const*, classad::ClassAd const*, classad::EvalState&, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) const () from /usr/lib64/libclassad.so.9
#14 0x00007f8595e67b22 in classad::ClassAd::_GetInternalReferences(classad::ExprTree const*, classad::ClassAd const*, classad::EvalState&, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) const () from /usr/lib64/libclassad.so.9
#15 0x00007f8595e67af6 in classad::ClassAd::_GetInternalReferences(classad::ExprTree const*, classad::ClassAd const*, classad::EvalState&, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) const () from /usr/lib64/libclassad.so.9
#16 0x00007f8595e67e2a in classad::ClassAd::_GetInternalReferences(classad::ExprTree const*, classad::ClassAd const*, classad::EvalState&, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) const () from /usr/lib64/libclassad.so.9
#17 0x00007f8595e67f48 in classad::ClassAd::_GetInternalReferences(classad::ExprTree const*, classad::ClassAd const*, classad::EvalState&, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) const () from /usr/lib64/libclassad.so.9
#18 0x00007f8595e67af6 in classad::ClassAd::_GetInternalReferences(classad::ExprTree const*, classad::ClassAd const*, classad::EvalState&, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) const () from /usr/lib64/libclassad.so.9
#19 0x00007f8595e67af6 in classad::ClassAd::_GetInternalReferences(classad::ExprTree const*, classad::ClassAd const*, classad::EvalState&, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) const () from /usr/lib64/libclassad.so.9
#20 0x00007f8595e67af6 in classad::ClassAd::_GetInternalReferences(classad::ExprTree const*, classad::ClassAd const*, classad::EvalState&, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) const () from /usr/lib64/libclassad.so.9
#21 0x00007f8595e67e2a in classad::ClassAd::_GetInternalReferences(classad::ExprTree const*, classad::ClassAd const*, classad::EvalState&, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) const () from /usr/lib64/libclassad.so.9
#22 0x00007f8595e680b3 in classad::ClassAd::GetInternalReferences(classad::ExprTree const*, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) const () from /usr/lib64/libclassad.so.9
#23 0x00007f85964da972 in putClassAd(Stream*, classad::ClassAd&, int, std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const*) () from /usr/lib64/libcondor_utils_8_7_0.so


> On Mar 11, 2017, at 10:39 AM, Greg Thain <gthain@xxxxxxxxxxx> wrote:
> 
> On 03/11/2017 10:11 AM, Todd Tannenbaum wrote:
>> 
>> At first blush, looks to me like the network between the CERN central manager and the clients doing queries occasionally becomes congested/slow. When this happens, forked query workers take a really long time to do their thing as evidenced by the massive numbers for the MissedForkRuntime* stats ("MissedFork" = queries that we did in-process because we would exceeded the max number of forked workers).
>> 
>> regards,
>> Todd
> 
> Thanks, Todd.  So, "HandleQueryForked" and "HandleQueryForkedMissed" are counts, right?  So, we only handled 7 queries inline because of too many forked children out of 20234 requests in this snippet.  Of those 7, though, one took 2124 seconds to complete, but of the 20227 forked queries, the slowest was 19 seconds?  Perhaps the collector should never handle a query inline, just return a EAGAIN error to the client.
>> 
>> Stats from CERN central manager (after running about about 2 days) :
>> 
>> 
>> HandleQueryForked = 20227
>> HandleQueryForkedRuntimeMax = 19.9297046919819
>> HandleQueryForkedRuntimeMin = 0.005120144924148917
>> HandleQueryMissedFork = 7
>> HandleQueryMissedForkRuntimeMax = 2124.699060306884
>> HandleQueryMissedForkRuntimeMin = 0.001575499074533582 
> 

[← Prev in Thread] Current Thread [Next in Thread→]