HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Thoughts on decreasing shadow memory use



On Jul 17, 2012, at 8:16 PM, Tim St Clair wrote:

> inline below. 
> 
> ----- Original Message -----
>> From: "Brian Bockelman" <bbockelm@xxxxxxxxxxx>
>> To: "Condor Developers" <condor-devel@xxxxxxxxxxx>
>> Sent: Tuesday, July 17, 2012 6:37:45 PM
>> Subject: [Condor-devel] Thoughts on decreasing shadow memory use
>> 
>> Hi,
>> 
>> When I last talked to Miron about multi-shadow, he suggested first
>> wringing every last byte out of the current one before even
>> proposing the multi-shadow.  So, I spent about an hour with igprof
>> and staring at smaps.
>> 
>> I measured a shadow as having 360KB of heap, about 550KB total
>> unshared space, and 274KB of data live on the heap (so about 25%
>> waste due to fragmentation).
>> 
>> Here's what I found that we could save.  List is in ascending order
>> of difficulty to implement.
>> 0) Turn off classad caching: 55KB.
> 
> All for disabling on the shadows, honestly it only really makes sense for the schedd and collector at this point. (Luckily Jamie added a knob) 
> 

Yes - despite me getting cranky with Jaime for adding another knob, I found it useful for testing this.

>> 1) Copy of job's classad inside the file transfer object: 8KB
>> 2) gethostbyaddr -> gethostbyaddr_r (including all callsites, even in
>> the logging code!  See ExecuteEvent::writeEvent): 5KB.
>> 3) getpwnam, getpwuid to reentrant versions: 2KB
>> 4) Remove stats object from DaemonCore for shadow: 7KB
>> 5) libcondor_utils has 156KB of dirty writable memory (non-const
>> statics?) that can't be shared: 100KB?  This part was not included
>> in my heap calculations, but is indeed non-shared.
> 
> This seems just odd to me, I may dig more on this could you file a ticket. 

From digging around, all of the param table and all the ATTR_* symbols are in the not-shared data segment, and hence become unique memory.

The simplest way forward seems to be:
- Change ATTR_* to const char [] and inline the symbols.  Right now, we keep N+1 copies of the symbol name and one copy of the symbol contents in memory (the symbol name and contents are about the same size; N is the number of uses of the ATTR_*).  Inlining would reduce this to just N.
- Change the param table to be based on macros (plain symbols can be in the read-only section; structs have to be initialized), and load it on the first use.

Having the attribute names and the param table non-shared are a rather embarrassing way to spend 25% of the shadow's memory budget!

> 
>> 6) Cleanup of auth code to reduce heap fragmentation: 5-15KB
>> 7) Un-loading the IpVerify table after usage: 9KB.
>> 8) The configuration subsystem.  This would be one tough nugget to
>> crack (note: would all be shared with the multi-shadow), but is very
>> lightly used after the shadow fires up.  70KB.
> 
> If we further minimize the config files and stuff as much into the param table as possible it should reduce *this.  
> 

I hope you have some ideas, because I was relatively stumped on how to do this nicely.  That's why I ranked it so low.

I did think about having the schedd pass parts of the config as a sub-classad of the first job ad.   Then, we could delay initializing the param table until the first time the shadow gets a SIGHUP or hits a parameter it doesn't know.

Sounds complicated though!

Brian