Re: [DynInst_API:] Measuring Dyninst Dynamic Instrumentation Overhead


Date: Thu, 19 Feb 2015 12:50:41 -0600
From: Bill Williams <bill@xxxxxxxxxxx>
Subject: Re: [DynInst_API:] Measuring Dyninst Dynamic Instrumentation Overhead
On 02/19/2015 10:25 AM, budchan chao wrote:
Hi All,

If I understand it correctly Dyninst uses ptrace to connect and modify
the mutatee. I want to check how much overhead it causes at runtime to
mutate an instrumentation point. Also I am interested in getting
overhead of a trampoline at runtime. Are there any existing benchmarks
for these I can run to get these number? If that's not the case I would
really appreciate any tips for coming up with these benchmarks being new
to the project.

Obligatory disclaimer: Dyninst overhead is highly variable depending on the context in which you're using it and your skill at writing an efficient mutator. I'm trying to give good general information below; if you can share a bit about the environment you're working in, I (and the rest of the list) can provide more focused advice.

We've generally used SPECINT/SPECFP as our baseline set of mutatees for overhead testing. Precise benchmarking of various components of our instrumentation overhead can require some tweaking of Dyninst internals; we haven't released any standard benchmarking mutators (that I'm aware of) recently.

You can insert null/no-op instrumentation at your desired instrumentation points and get a reasonable benchmark of the springboard/relocation overhead associated with instrumenting those points.

I think for trampoline overhead one I can time call loop for an empty
function (inlined) with an entry instrumentation. For the first one I
think measuring elapsed time between processAttach and continueExecution
would do the trick. Am I correct? Just want to make sure I am thinking
correctly on this.

Calling an empty function with entry instrumentation is going to give you skewed relative overhead and may or may not give you useful absolute overhead. Relative overhead will, to a first approximation, be proportional to the fraction of new instructions added, and most functions you'd want to instrument in real code are not actually empty. An empty function is also going to have potentially very different cache behavior from a real-world function, and the perturbations that instrumentation causes there will have little to do with the sorts of perturbations we see in real applications.

There should be some measure of parsing time that's amortized into the first instrumentation operation on a given DSO in a process. I don't know how precisely you want to separate parsing, code generation, and the actual mechanics of inserting a generated binary blob, but what you're proposing to measure between attach and continue is going to contain some of each of those.

Also I was wondering if there was way to do the dynamic instrumentation
"in-band" if that makes sense. (Like using a separate thread in the same
process so that there is no need to have a separate mutator process to
do it.)

There have been various projects in the group over the years that do in-band (or first-party, as we refer to it) instrumentation. As far as I know, none of them have taken a separate thread approach. There's also Dyninst's binary rewriting mode, where the parsing/codegen/instrumentation process occurs once up-front and then you run the instrumented binary on its own.

Regards
Chan


_______________________________________________
Dyninst-api mailing list
Dyninst-api@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api



--
--bw

Bill Williams
Paradyn Project
bill@xxxxxxxxxxx
[← Prev in Thread] Current Thread [Next in Thread→]