> On Mar 10, 2017, at 10:24 AM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
>
>
>> On Mar 10, 2017, at 11:17 AM, Greg Thain <gthain@xxxxxxxxxxx> wrote:
>>
>> On 03/10/2017 10:12 AM, Brian Bockelman wrote:
>>> Hi,
>>>
>>> CMS is just getting nailed by dynamic slot updates to our top-level collector.
>>>
>>> Are these actually needed for negotiation? Are they only being used by the accountant? Is there any easy way to utilize the information in the parent slot instead?
>>
>> When preemption is disabled, the d-slots are only used by the accountant. And by "nailed", I assume you mean updates?
>
> Two things are apparent looking at the collector:
> 1) Update rates to the top-level collector are still too high.
> 2) Combined with (1), the sheer number of ads + high update rate means that doing a fork of the collector is expensive in terms of RAM. Less ads in the collector / less turnover of slots would help.
> - Since the top level collector stops responding if it cannot fork more workers, we effectively are forced to have an unbound number of query workers.
>
I forgot to mention --
ClassAd caching probably really hurts us here, right? The cache envelope objects are all ref-counted, meaning that indexing a new ad is going to touch the pages of a lot of other ads -- versus just CoW'ing pages associated with the ad.
Looking at the private-vs-shared memory, 1GB/s growth is a reasonable estimate.
I think, in the end, the CoW approach just wasn't designed for sites with an update rate of O(150Hz).
Brian
RecentUpdatesTotal = 167862
RecentUpdatesTotal_Accouting = 863
RecentUpdatesTotal_Collector = 264
RecentUpdatesTotal_glidefrontendmonitor = 1
RecentUpdatesTotal_glideresource = 2466
RecentUpdatesTotal_HAD = 11
RecentUpdatesTotal_Job_Router = 13
RecentUpdatesTotal_Master = 24
RecentUpdatesTotal_Negotiator = 7
RecentUpdatesTotal_Replication = 18
RecentUpdatesTotal_Schedd = 310
RecentUpdatesTotal_Start = 151788
RecentUpdatesTotal_Submittor = 12097
|