HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Thoughts on decreasing shadow memory use



inline below. 

----- Original Message -----
> From: "Brian Bockelman" <bbockelm@xxxxxxxxxxx>
> To: "Condor Developers" <condor-devel@xxxxxxxxxxx>
> Sent: Tuesday, July 17, 2012 6:37:45 PM
> Subject: [Condor-devel] Thoughts on decreasing shadow memory use
> 
> Hi,
> 
> When I last talked to Miron about multi-shadow, he suggested first
> wringing every last byte out of the current one before even
> proposing the multi-shadow.  So, I spent about an hour with igprof
> and staring at smaps.
> 
> I measured a shadow as having 360KB of heap, about 550KB total
> unshared space, and 274KB of data live on the heap (so about 25%
> waste due to fragmentation).
> 
> Here's what I found that we could save.  List is in ascending order
> of difficulty to implement.
> 0) Turn off classad caching: 55KB.

All for disabling on the shadows, honestly it only really makes sense for the schedd and collector at this point. (Luckily Jamie added a knob) 

> 1) Copy of job's classad inside the file transfer object: 8KB
> 2) gethostbyaddr -> gethostbyaddr_r (including all callsites, even in
> the logging code!  See ExecuteEvent::writeEvent): 5KB.
> 3) getpwnam, getpwuid to reentrant versions: 2KB
> 4) Remove stats object from DaemonCore for shadow: 7KB
> 5) libcondor_utils has 156KB of dirty writable memory (non-const
> statics?) that can't be shared: 100KB?  This part was not included
> in my heap calculations, but is indeed non-shared.

This seems just odd to me, I may dig more on this could you file a ticket. 

> 6) Cleanup of auth code to reduce heap fragmentation: 5-15KB
> 7) Un-loading the IpVerify table after usage: 9KB.
> 8) The configuration subsystem.  This would be one tough nugget to
> crack (note: would all be shared with the multi-shadow), but is very
> lightly used after the shadow fires up.  70KB.

If we further minimize the config files and stuff as much into the param table as possible it should reduce *this.  

> 
> Lessons learned:
> - Classad caching does more harm than good for a single shadow (20%
> of heap)
> - If we squeeze really hard at odds-n-ends in the heap, we can shrink
> the heap by 10%.  I don't think all the items listed above are
> plausible (especially 8).
> - Non-const globals in libcondor_utils consist of 25% of the total
> memory footprint.  There are 332 source files in libcondor_utils -
> whack-a-mole time?
>   - Similarly, there are a few things sitting around in the other
>   Condor libraries, but nothing as sizable.
> - Obviously sharable resources for the multi-shadow (parameter
> subsystem, auth hash maps and tables, daemon core object) make up
> 50% of the heap.
> - It's not immediately obvious how much the ClassAd cache will affect
> the multi-shadow, but I would expect a bit of sharing.  Let's
> estimate 50% of the current cache is sharable, or 10% of the total
> heap.
> 
> So, we can squeeze about 15% of the shadow size by continuing to
> shave things and turning off caching.
> 
> Assuming 10 jobs per 1 shadow, we could realize a 60% memory gain.
> 
> Both numbers become more dramatic if we can figure out who's hanging
> out in the data segment.
> 
> Brian
> 
> PS - all numbers have been rounded and self-consistency is limited to
> my ability to do mental math.
> 
> PPS - after 5 minutes with 'nm', it appears the data segment consists
> primarily of the parameter table.  DOH!
> 
> _______________________________________________
> Condor-devel mailing list
> Condor-devel@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-devel
>