On 3/10/2017 10:44 AM, Greg Thain wrote:
On 03/10/2017 10:33 AM, Brian Bockelman wrote:
Is it safe to assume that 25% of the updates are d-slot private ads?
Can we drop those?
We can drop all the private ads for claimed slots (dynamic or static),
but the private ads are very small compared to the public ads, so I
don't know how much of a difference that would make.
-greg
We can put the information contained in the private ads into the
corresponding public ads as private attributes. But as Greg says, not
sure that it would help much. And if preemption is disabled, as Greg
noted above, we don't need any private ads for dynamic slots (as they
are all claimed).
In v8.7.0 we have additional statistics in the collector ad about where
the top-level collector is spending time and stats on the number of fork
workers.
I think we pursue the following:
1. Once the collector has reached its forked child worker limit, it
should queue up additional query requests and service them as child
workers exit. We could do this for v8.7.1 (in fact, I am hoping to do
this next week, assuming the new stats in v8.7.0 show that this is going
to help).
2. The collector should only respond to queries in-process if we "know"
the response will be faster than forking. Right now we make the
decision to fork or not based on the table being queried. I propose we
make the following changes: (a) collector should respond in-process only
if the query is for a small table AND the query has a projection of less
than X attributes, and (b) collector in-process results need to sent
back to the client using non-blocking I/O. Item (a) is trivial and
could happen for v8.7.1; item (b) is a bit more involved, but not too
bad, since happily the collector only does one end-of-message after
sending all the responses, so a non-blocking relisock can happily buffer
the response (at the cost of RAM) without needing to deal with moving to
shared or weak pointers to ads in the collector.
3. If and only of preemption is disabled, then: (1) the accountant could
get accounting information out of pslot roll-up information, so child
collectors could avoid sending dslots to the parent, and (2) no need for
children to forward private ads for slots in Claimed state.
3. Make the "collector tree" central manager setup a first-class
configuration that only requires the admin to state something simple
like the max size of their pool and/or the number of child collectors.
If HTCondor always configures the collector tree in a specific manner,
we can leverage that to our benefit instead of trying to make things
better given any possible way folks could set things up. We could, for
instance, always setup two top-level collectors, one just for operations
(the negotiator) and one or more for monitoring (condor_status). (Yes,
this is trading off RAM for performance). We could have the shared_port
forward updates to specific child collectors (removing the complexity of
a collector-tree config at the startd/schedd machines). We could have
CCB always in a separate set of processes. Etc Etc.
regards,
Todd
--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing Department of Computer Sciences
HTCondor Technical Lead 1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132 Madison, WI 53706-1685
|