Re: [DynInst_API:] Measuring Dyninst Dynamic Instrumentation Overhead

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Date:	Thu, 19 Feb 2015 22:09:39 +0000 (UTC)
From:	budchan chao <cbudchan@xxxxxxxxx>
Subject:	Re: [DynInst_API:] Measuring Dyninst Dynamic Instrumentation Overhead

Thanks for the reply. Please find some responses inline.

On Thursday, 19 February 2015 1:50 PM, Bill Williams <bill@xxxxxxxxxxx> wrote:

On 02/19/2015 10:25 AM, budchan chao wrote:
> Hi All,
>
> If I understand it correctly Dyninst uses ptrace to connect and modify
> the mutatee. I want to check how much overhead it causes at runtime to
> mutate an instrumentation point. Also I am interested in getting
> overhead of a trampoline at runtime. Are there any existing benchmarks
> for these I can run to get these number? If that's not the case I would
> really appreciate any tips for coming up with these benchmarks being new
> to the project.

Obligatory disclaimer: Dyninst overhead is highly variable depending on
the context in which you're using it and your skill at writing an
efficient mutator. I'm trying to give good general information below; if
you can share a bit about the environment you're working in, I (and the
rest of the list) can provide more focused advice.

It is x86 ELF binaries (GCC) that I am working with.

We've generally used SPECINT/SPECFP as our baseline set of mutatees for
overhead testing. Precise benchmarking of various components of our
instrumentation overhead can require some tweaking of Dyninst internals;
we haven't released any standard benchmarking mutators (that I'm aware
of) recently.

I have several SPECINT (h264 etc.) applications that I am planning to use down

the line for benchmarking with DynInst to get an idea on typical overheads involved.

Any suggestions for good set of benchmark applications which cover varied

runtime behaviors?

You can insert null/no-op instrumentation at your desired
instrumentation points and get a reasonable benchmark of the
springboard/relocation overhead associated with instrumenting those points.

I will try that

> I think for trampoline overhead one I can time call loop for an empty
> function (inlined) with an entry instrumentation. For the first one I
> think measuring elapsed time between processAttach and continueExecution
> would do the trick. Am I correct? Just want to make sure I am thinking
> correctly on this.
>

Calling an empty function with entry instrumentation is going to give
you skewed relative overhead and may or may not give you useful absolute
overhead. Relative overhead will, to a first approximation, be
proportional to the fraction of new instructions added, and most
functions you'd want to instrument in real code are not actually empty.
An empty function is also going to have potentially very different cache
behavior from a real-world function, and the perturbations that
instrumentation causes there will have little to do with the sorts of
perturbations we see in real applications.

This indeed make sense.

There should be some measure of parsing time that's amortized into the
first instrumentation operation on a given DSO in a process. I don't
know how precisely you want to separate parsing, code generation, and
the actual mechanics of inserting a generated binary blob, but what
you're proposing to measure between attach and continue is going to
contain some of each of those.

What if I insert snippet and then somehow remove it (didn't yet see the API

calls related removal of snippets at runtime) and re-insert it. Would it cache

the generated code and just reuse it the second time around. In that case

I could potentially time that second insertion to leave out the code generation

overhead?

> Also I was wondering if there was way to do the dynamic instrumentation
> "in-band" if that makes sense. (Like using a separate thread in the same
> process so that there is no need to have a separate mutator process to
> do it.)

There have been various projects in the group over the years that do
in-band (or first-party, as we refer to it) instrumentation. As far as I
know, none of them have taken a separate thread approach. There's also
Dyninst's binary rewriting mode, where the
parsing/codegen/instrumentation process occurs once up-front and then
you run the instrumented binary on its own.

Interesting..

> Regards
> Chan

>
>
> _______________________________________________
> Dyninst-api mailing list
> Dyninst-api@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api
>

--
--bw

Bill Williams
Paradyn Project
bill@xxxxxxxxxxx

[← Prev in Thread]	Current Thread	[Next in Thread→]
[DynInst_API:] Measuring Dyninst Dynamic Instrumentation Overhead, budchan chao Re: [DynInst_API:] Measuring Dyninst Dynamic Instrumentation Overhead, Bill Williams Re: [DynInst_API:] Measuring Dyninst Dynamic Instrumentation Overhead, Barton Miller Re: [DynInst_API:] Measuring Dyninst Dynamic Instrumentation Overhead, budchan chao Re: [DynInst_API:] Measuring Dyninst Dynamic Instrumentation Overhead, budchan chao <=

Previous by Date:	Re: [DynInst_API:] Measuring Dyninst Dynamic Instrumentation Overhead, Barton Miller
Next by Date:	Re: [DynInst_API:] Measuring Dyninst Dynamic Instrumentation Overhead, budchan chao
Previous by Thread:	Re: [DynInst_API:] Measuring Dyninst Dynamic Instrumentation Overhead, budchan chao
Next by Thread:	[DynInst_API:] Modules within shared libraries, Rogers, Kelly K
Indexes:	[Date] [Thread]

Mailing List Archives

Authenticated access

Re: [DynInst_API:] Measuring Dyninst Dynamic Instrumentation Overhead