Re: [HTCondor-devel] Need for dynamic slots in top-level collector?


Date: Mon, 13 Mar 2017 11:01:42 -0500
From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
Subject: Re: [HTCondor-devel] Need for dynamic slots in top-level collector?
> On Mar 10, 2017, at 10:24 AM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
> 
> 
>> On Mar 10, 2017, at 11:17 AM, Greg Thain <gthain@xxxxxxxxxxx> wrote:
>> 
>> On 03/10/2017 10:12 AM, Brian Bockelman wrote:
>>> Hi,
>>> 
>>> CMS is just getting nailed by dynamic slot updates to our top-level collector.
>>> 
>>> Are these actually needed for negotiation?  Are they only being used by the accountant?  Is there any easy way to utilize the information in the parent slot instead?
>> 
>> When preemption is disabled, the d-slots are only used by the accountant.  And by "nailed", I assume you mean updates?  
> 
> Two things are apparent looking at the collector:
> 1) Update rates to the top-level collector are still too high.
> 2) Combined with (1), the sheer number of ads + high update rate means that doing a fork of the collector is expensive in terms of RAM.  Less ads in the collector / less turnover of slots would help.
>  - Since the top level collector stops responding if it cannot fork more workers, we effectively are forced to have an unbound number of query workers.
> 

I forgot to mention --

ClassAd caching probably really hurts us here, right?  The cache envelope objects are all ref-counted, meaning that indexing a new ad is going to touch the pages of a lot of other ads -- versus just CoW'ing pages associated with the ad.

Looking at the private-vs-shared memory, 1GB/s growth is a reasonable estimate.

I think, in the end, the CoW approach just wasn't designed for sites with an update rate of O(150Hz).

Brian

RecentUpdatesTotal = 167862
RecentUpdatesTotal_Accouting = 863
RecentUpdatesTotal_Collector = 264
RecentUpdatesTotal_glidefrontendmonitor = 1
RecentUpdatesTotal_glideresource = 2466
RecentUpdatesTotal_HAD = 11
RecentUpdatesTotal_Job_Router = 13
RecentUpdatesTotal_Master = 24
RecentUpdatesTotal_Negotiator = 7
RecentUpdatesTotal_Replication = 18
RecentUpdatesTotal_Schedd = 310
RecentUpdatesTotal_Start = 151788
RecentUpdatesTotal_Submittor = 12097

[← Prev in Thread] Current Thread [Next in Thread→]