[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-devel] Thoughts on decreasing shadow memory use
- Date: Wed, 18 Jul 2012 07:18:28 -0500
- From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
- Subject: Re: [Condor-devel] Thoughts on decreasing shadow memory use
On Jul 17, 2012, at 8:16 PM, Tim St Clair wrote:
> inline below.
>
> ----- Original Message -----
>> From: "Brian Bockelman" <bbockelm@xxxxxxxxxxx>
>> To: "Condor Developers" <condor-devel@xxxxxxxxxxx>
>> Sent: Tuesday, July 17, 2012 6:37:45 PM
>> Subject: [Condor-devel] Thoughts on decreasing shadow memory use
>>
>> Hi,
>>
>> When I last talked to Miron about multi-shadow, he suggested first
>> wringing every last byte out of the current one before even
>> proposing the multi-shadow. So, I spent about an hour with igprof
>> and staring at smaps.
>>
>> I measured a shadow as having 360KB of heap, about 550KB total
>> unshared space, and 274KB of data live on the heap (so about 25%
>> waste due to fragmentation).
>>
>> Here's what I found that we could save. List is in ascending order
>> of difficulty to implement.
>> 0) Turn off classad caching: 55KB.
>
> All for disabling on the shadows, honestly it only really makes sense for the schedd and collector at this point. (Luckily Jamie added a knob)
>
Yes - despite me getting cranky with Jaime for adding another knob, I found it useful for testing this.
>> 1) Copy of job's classad inside the file transfer object: 8KB
>> 2) gethostbyaddr -> gethostbyaddr_r (including all callsites, even in
>> the logging code! See ExecuteEvent::writeEvent): 5KB.
>> 3) getpwnam, getpwuid to reentrant versions: 2KB
>> 4) Remove stats object from DaemonCore for shadow: 7KB
>> 5) libcondor_utils has 156KB of dirty writable memory (non-const
>> statics?) that can't be shared: 100KB? This part was not included
>> in my heap calculations, but is indeed non-shared.
>
> This seems just odd to me, I may dig more on this could you file a ticket.
From digging around, all of the param table and all the ATTR_* symbols are in the not-shared data segment, and hence become unique memory.
The simplest way forward seems to be:
- Change ATTR_* to const char [] and inline the symbols. Right now, we keep N+1 copies of the symbol name and one copy of the symbol contents in memory (the symbol name and contents are about the same size; N is the number of uses of the ATTR_*). Inlining would reduce this to just N.
- Change the param table to be based on macros (plain symbols can be in the read-only section; structs have to be initialized), and load it on the first use.
Having the attribute names and the param table non-shared are a rather embarrassing way to spend 25% of the shadow's memory budget!
>
>> 6) Cleanup of auth code to reduce heap fragmentation: 5-15KB
>> 7) Un-loading the IpVerify table after usage: 9KB.
>> 8) The configuration subsystem. This would be one tough nugget to
>> crack (note: would all be shared with the multi-shadow), but is very
>> lightly used after the shadow fires up. 70KB.
>
> If we further minimize the config files and stuff as much into the param table as possible it should reduce *this.
>
I hope you have some ideas, because I was relatively stumped on how to do this nicely. That's why I ranked it so low.
I did think about having the schedd pass parts of the config as a sub-classad of the first job ad. Then, we could delay initializing the param table until the first time the shadow gets a SIGHUP or hits a parameter it doesn't know.
Sounds complicated though!
Brian