Re: [DynInst_API:] A question about dynInst's static instrumentation ability


Date: Mon, 24 Aug 2015 13:22:27 -0500
From: Bill Williams <bill@xxxxxxxxxxx>
Subject: Re: [DynInst_API:] A question about dynInst's static instrumentation ability
On 08/24/2015 01:02 PM, Shuai Wang wrote:
Hello Bill,


Thank you for your response! I didn't know this mechanism before and I
am very interested in!
May I ask how does Dyninst decide when to leverage this optimization?
And when not to optimize?

The quick and oversimplified answer: if a function is known to have unresolved control flow, which would happen as a result of an indirect branch that we can't statically parse, we would not be able to relocate the function safely, as we wouldn't be able to tell whether existing basic blocks were split by control flow we didn't know about. Otherwise, it's safe provided that we appropriately translate all PC-sensitive instructions. I really do recommend reading Drew's paper; it covers this far more precisely than I can over email.

Can I turn on or off this mechanism by configuration?

No, you'd have to work with the Dyninst internals. Once we determined that block-level relocation was horrifically expensive due to instruction cache misses, we rewrote our relocation system to be purely function-oriented.

You might be able to recover an older relocation system from an older version of Dyninst, but I wouldn't recommend that for anything other than satisfying personal curiosity--the older versions are not robust against changes in compilers etc. that have happened since they were released.

Sorry if I trouble you too much.. Looking forward to your response!

Sincerely,
Shuai





On Mon, Aug 24, 2015 at 12:38 PM, Bill Williams <bill@xxxxxxxxxxx
<mailto:bill@xxxxxxxxxxx>> wrote:

    On 08/23/2015 12:22 AM, Shuai Wang wrote:

        Hello Xiaozhu,

        Thank you a lot for your response. I double-checked the gdb output,
        and I suppose only one piece of instrumentation code is indeed
        executed.

        In particular,
        even tough basic blocks are instrumented like this (please see the
        jmpq instructions):

        http://i.stack.imgur.com/Zl0ar.png


        But actually in the gdb debugging code only on one "addq"
        instruction is
        indeed inserted..

        http://i.stack.imgur.com/NHx7F.png

        Am I missed anything..?

    You may want to take a look at the code coverage example, available
    here:

    http://www.paradyn.org/html/tools/codecoverage.html

    It's doing both function-level and block-level code coverage.


        BTW: How can you indeed put all the instrumentation code and
        original
        together in one section? IMHO,
        as you don't have the relocation information in the disassembled
        output,
        you actually cannot directly
        "inlineâ instrumentation code into the original code.. Could you
        please
        elaborate a little bit?

    This topic is covered at length in Drew Bernat's Anywhere, Anytime
    Binary Instrumentation paper:

    ftp://ftp.cs.wisc.edu/paradyn/papers/Bernat11AWAT.pdf

    The short version: if we parse the binary sufficiently accurately,
    and we are careful of what we know and what we don't know, we can
    relocate most code safely without compiler-level relocation
    information, and we can tell what's not safe to relocat. It's not
    easy, but it's not impossible either.


        Thank you a lot for response.


        Sincerely,
        Shuai



        On Sun, Aug 23, 2015 at 1:05 AM, Xiaozhu Meng <mxz297@xxxxxxxxx
        <mailto:mxz297@xxxxxxxxx>
        <mailto:mxz297@xxxxxxxxx <mailto:mxz297@xxxxxxxxx>>> wrote:

             Hi Shuai,

             Since you instrumented every basic block of a function,
        Dyninst would
             relocate the whole original function to another section.
        The relocated
             function would contain both the original code and the
        instrumentation
             code. Therefore, executing all the instructions at the patched
             sections would actually execute both your instrumentation
        and the
             original code. One reason to not jump back immediately after
             instrumentation is that executing two extra jumps for each
        basic block
             would significantly slow down the execution.

             Thanks

             --Xiaozhu

             On Sat, Aug 22, 2015 at 10:37 PM, Shuai Wang
        <wangshuai901@xxxxxxxxx <mailto:wangshuai901@xxxxxxxxx>
             <mailto:wangshuai901@xxxxxxxxx
        <mailto:wangshuai901@xxxxxxxxx>>> wrote:
              > Dear list,
              >
              >
              > I basically want to instrument an ELF binary, adding some
             instrumentation
              > code to the beginning of every basic block.  I use DynInst
             version 8.2.1 on
              > 64-bit Linux platform. I am instrumenting some unstripped
             binaries now but I
              > want to move forward to stripped binaries later.
              >
              > I found some very confusing situation in the
        instrumented output,
             could
              > anyone educate me on that..? Sorry if it is really a stupid
             question.. Let
              > me elaborate it here:
              >
              > 1. I insert one instruction to the beginning of every
        basic block.
              >
              > 2. After instrumentation, I use objdump to check the
        output, I
             are assured
              > that basic blocks' begining instruction(s) have been
        substituted
             with a
              > "jmp" instruction to the patched section, something like
        this:
              >            jmpq   700280 <main_dyninst>
              >
              > 3. I use gdb to go with the execution flow on the
        instrumented
             output, and I
              > observed that when execution flow hits the first jmpq
        instruction
             (at the
              > beginning of main function actually), it is redirected
        to the patched
              > section.
              >
              > 4. I observed the execution at patched section,
        including both
              > instrumentation code, also the replaced instructions at the
             instrumentation
              > point of the original binary. However, to my surprise, the
             execution flow
              > isn't redirected back to the original code section, and
        it just
             execute all
              > the instructions at the patched sections.  And as a
        result, even I
              > instrumented every basic block, but only instrumentation
        code at
             the first
              > basic block was indeed executed during runtime.
              >
              >
              > I suppose for a static instrumentation, after execution of
             instrumentation
              > code and replaced instructions at the patched section, the
             execution flow is
              > then redirected back by a jmp instruction to the
        original code
             section. Am I
              > missed anything here..? Or do I have to configure some
        options in
             my code
              > for this type of functionality..?
              >
              > Sorry for my disorganized description, am I clear?  If
        so, could
             anyone give
              > me some help..? I really appreciate that!
              >
              > Sincerely,
              > Shuai
              >
              >
              >
              > _______________________________________________
              > Dyninst-api mailing list
              > Dyninst-api@xxxxxxxxxxx <mailto:Dyninst-api@xxxxxxxxxxx>
        <mailto:Dyninst-api@xxxxxxxxxxx <mailto:Dyninst-api@xxxxxxxxxxx>>
              > https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api
              >




        _______________________________________________
        Dyninst-api mailing list
        Dyninst-api@xxxxxxxxxxx <mailto:Dyninst-api@xxxxxxxxxxx>
        https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api



    --
    --bw

    Bill Williams
    Paradyn Project
    bill@xxxxxxxxxxx <mailto:bill@xxxxxxxxxxx>




--
--bw

Bill Williams
Paradyn Project
bill@xxxxxxxxxxx
[← Prev in Thread] Current Thread [Next in Thread→]