Re: [HTCondor-devel] Need for dynamic slots in top-level collector?


Date: Fri, 10 Mar 2017 16:42:58 -0500
From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
Subject: Re: [HTCondor-devel] Need for dynamic slots in top-level collector?
Hi folks,

Any wild ideas of how I could grease the top-level collector to at least survive the weekend?

We have a lot of cores in the pool currently (consistently over 200k), causing the OOM killer to fire about once a minute.

Thanks!

Brian

> On Mar 10, 2017, at 1:07 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
> 
> Hi Todd,
> 
> (1) helps.  (3) helps.
> 
> Not sure if the others are interesting though: we can emulated (2) operationally by hunting down the bad queries.
> 
> Fundamentally though - the update rate is higher than the processing rate of a single collector.  We can see that as the UDP buffer size is over 250MB and we still drop updates.
> 
> We need to either decrease the update rate further, increase the processing rate, or distribute queries across multiple collectors.
> 
> Brian
> 
> Sent from my iPhone
> 
>> On Mar 10, 2017, at 12:13 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
>> 
>>> On 3/10/2017 10:44 AM, Greg Thain wrote:
>>>> On 03/10/2017 10:33 AM, Brian Bockelman wrote:
>>>> Is it safe to assume that 25% of the updates are d-slot private ads?
>>>> Can we drop those?
>>> 
>>> We can drop all the private ads for claimed slots (dynamic or static),
>>> but the private ads are very small compared to the public ads, so I
>>> don't know how much of a difference that would make.
>>> 
>>> -greg
>> 
>> We can put the information contained in the private ads into the corresponding public ads as private attributes.  But as Greg says, not sure that it would help much.  And if preemption is disabled, as Greg noted above, we don't need any private ads for dynamic slots (as they are all claimed).
>> 
>> In v8.7.0 we have additional statistics in the collector ad about where the top-level collector is spending time and stats on the number of fork workers.
>> 
>> I think we pursue the following:
>> 
>> 1. Once the collector has reached its forked child worker limit, it should queue up additional query requests and service them as child workers exit.  We could do this for v8.7.1 (in fact, I am hoping to do this next week, assuming the new stats in v8.7.0 show that this is going to help).
>> 
>> 2. The collector should only respond to queries in-process if we "know" the response will be faster than forking.  Right now we make the decision to fork or not based on the table being queried.  I propose we make the following changes: (a) collector should respond in-process only if the query is for a small table AND the query has a projection of less than X attributes, and (b) collector in-process results need to sent back to the client using non-blocking I/O.  Item (a) is trivial and could happen for v8.7.1; item (b) is a bit more involved, but not too bad, since happily the collector only does one end-of-message after sending all the responses, so a non-blocking relisock can happily buffer the response (at the cost of RAM) without needing to deal with moving to shared or weak pointers to ads in the collector.
>> 
>> 3. If and only of preemption is disabled, then: (1) the accountant could get accounting information out of pslot roll-up information, so child collectors could avoid sending dslots to the parent, and (2) no need for children to forward private ads for slots in Claimed state.
>> 
>> 3. Make the "collector tree" central manager setup a first-class configuration that only requires the admin to state something simple like the max size of their pool and/or the number of child collectors. If HTCondor always configures the collector tree in a specific manner, we can leverage that to our benefit instead of trying to make things better given any possible way folks could set things up.  We could, for instance, always setup two top-level collectors, one just for operations (the negotiator) and one or more for monitoring (condor_status).  (Yes, this is trading off RAM for performance). We could have the shared_port forward updates to specific child collectors (removing the complexity of a collector-tree config at the startd/schedd machines).  We could have CCB always in a separate set of processes.  Etc Etc.
>> 
>> regards,
>> Todd
>> 
>> 
>> -- 
>> Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
>> Center for High Throughput Computing   Department of Computer Sciences
>> HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
>> Phone: (608) 263-7132                  Madison, WI 53706-1685
> 
> _______________________________________________
> HTCondor-devel mailing list
> HTCondor-devel@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel

[← Prev in Thread] Current Thread [Next in Thread→]