Re: [DynInst_API:] A question about dynInst's static instrumentation ability


Date: Mon, 24 Aug 2015 17:42:08 -0400
From: Shuai Wang <wangshuai901@xxxxxxxxx>
Subject: Re: [DynInst_API:] A question about dynInst's static instrumentation ability
Thank you Bill, I looked into the details in the paper and also the instrumented output, I suppose this is a replica-based instrumentation... Anyway, thank you a lot.

On Mon, Aug 24, 2015 at 2:22 PM, Bill Williams <bill@xxxxxxxxxxx> wrote:
On 08/24/2015 01:02 PM, Shuai Wang wrote:
Hello Bill,


Thank you for your response! I didn't know this mechanism before and I
am very interested in!
May I ask how does Dyninst decide when to leverage this optimization?
And when not to optimize?

The quick and oversimplified answer: if a function is known to have unresolved control flow, which would happen as a result of an indirect branch that we can't statically parse, we would not be able to relocate the function safely, as we wouldn't be able to tell whether existing basic blocks were split by control flow we didn't know about. Otherwise, it's safe provided that we appropriately translate all PC-sensitive instructions. I really do recommend reading Drew's paper; it covers this far more precisely than I can over email.

Can I turn on or off this mechanism by configuration?

No, you'd have to work with the Dyninst internals. Once we determined that block-level relocation was horrifically expensive due to instruction cache misses, we rewrote our relocation system to be purely function-oriented.

You might be able to recover an older relocation system from an older version of Dyninst, but I wouldn't recommend that for anything other than satisfying personal curiosity--the older versions are not robust against changes in compilers etc. that have happened since they were released.

Sorry if I trouble you too much.. Looking forward to your response!

Sincerely,
Shuai





On Mon, Aug 24, 2015 at 12:38 PM, Bill Williams <bill@xxxxxxxxxxx
<mailto:bill@xxxxxxxxxxx>> wrote:

  On 08/23/2015 12:22 AM, Shuai Wang wrote:

    Hello Xiaozhu,

    Thank you a lot for your response. I double-checked the gdb output,
    and I suppose only one piece of instrumentation code is indeed
    executed.

    In particular,
    even tough basic blocks are instrumented like this (please see the
    jmpq instructions):

    http://i.stack.imgur.com/Zl0ar.png


    But actually in the gdb debugging code only on one "addq"
    instruction is
    indeed inserted..

    http://i.stack.imgur.com/NHx7F.png

    Am I missed anything..?

  You may want to take a look at the code coverage example, available
  here:

  http://www.paradyn.org/html/tools/codecoverage.html

  It's doing both function-level and block-level code coverage.


    BTW: How can you indeed put all the instrumentation code and
    original
    together in one section? IMHO,
    as you don't have the relocation information in the disassembled
    output,
    you actually cannot directly
    "inlineâ instrumentation code into the original code.. Could you
    please
    elaborate a little bit?

  This topic is covered at length in Drew Bernat's Anywhere, Anytime
  Binary Instrumentation paper:

  ftp://ftp.cs.wisc.edu/paradyn/papers/Bernat11AWAT.pdf

  The short version: if we parse the binary sufficiently accurately,
  and we are careful of what we know and what we don't know, we can
  relocate most code safely without compiler-level relocation
  information, and we can tell what's not safe to relocat. It's not
  easy, but it's not impossible either.


    Thank you a lot for response.


    Sincerely,
    Shuai



    On Sun, Aug 23, 2015 at 1:05 AM, Xiaozhu Meng <mxz297@xxxxxxxxx
    <mailto:mxz297@xxxxxxxxx>
    <mailto:mxz297@xxxxxxxxx <mailto:mxz297@xxxxxxxxx>>> wrote:

      ÂHi Shuai,

      ÂSince you instrumented every basic block of a function,
    Dyninst would
      Ârelocate the whole original function to another section.
    The relocated
      Âfunction would contain both the original code and the
    instrumentation
      Âcode. Therefore, executing all the instructions at the patched
      Âsections would actually execute both your instrumentation
    and the
      Âoriginal code. One reason to not jump back immediately after
      Âinstrumentation is that executing two extra jumps for each
    basic block
      Âwould significantly slow down the execution.

      ÂThanks

      Â--Xiaozhu

      ÂOn Sat, Aug 22, 2015 at 10:37 PM, Shuai Wang
    <wangshuai901@xxxxxxxxx <mailto:wangshuai901@xxxxxxxxx>
      Â<mailto:wangshuai901@xxxxxxxxx

    <mailto:wangshuai901@xxxxxxxxx>>> wrote:
       > Dear list,
       >
       >
       > I basically want to instrument an ELF binary, adding some
      Âinstrumentation
       > code to the beginning of every basic block. I use DynInst
      Âversion 8.2.1 on
       > 64-bit Linux platform. I am instrumenting some unstripped
      Âbinaries now but I
       > want to move forward to stripped binaries later.
       >
       > I found some very confusing situation in the
    instrumented output,
      Âcould
       > anyone educate me on that..? Sorry if it is really a stupid
      Âquestion.. Let
       > me elaborate it here:
       >
       > 1. I insert one instruction to the beginning of every
    basic block.
       >
       > 2. After instrumentation, I use objdump to check the
    output, I
      Âare assured
       > that basic blocks' begining instruction(s) have been
    substituted
      Âwith a
       > "jmp" instruction to the patched section, something like
    this:
       >      jmpq Â700280 <main_dyninst>
       >
       > 3. I use gdb to go with the execution flow on the
    instrumented
      Âoutput, and I
       > observed that when execution flow hits the first jmpq
    instruction
      Â(at the
       > beginning of main function actually), it is redirected
    to the patched
       > section.
       >
       > 4. I observed the execution at patched section,
    including both
       > instrumentation code, also the replaced instructions at the
      Âinstrumentation
       > point of the original binary. However, to my surprise, the
      Âexecution flow
       > isn't redirected back to the original code section, and
    it just
      Âexecute all
       > the instructions at the patched sections. And as a
    result, even I
       > instrumented every basic block, but only instrumentation
    code at
      Âthe first
       > basic block was indeed executed during runtime.
       >
       >
       > I suppose for a static instrumentation, after execution of
      Âinstrumentation
       > code and replaced instructions at the patched section, the
      Âexecution flow is
       > then redirected back by a jmp instruction to the
    original code
      Âsection. Am I
       > missed anything here..? Or do I have to configure some
    options in
      Âmy code
       > for this type of functionality..?
       >
       > Sorry for my disorganized description, am I clear? If
    so, could
      Âanyone give
       > me some help..? I really appreciate that!
       >
       > Sincerely,
       > Shuai
       >
       >
       >
       > _______________________________________________
       > Dyninst-api mailing list
       > Dyninst-api@xxxxxxxxxxx <mailto:Dyninst-api@xxxxxxxxxxx>
    <mailto:Dyninst-api@xxxxxxxxxxx <mailto:Dyninst-api@xxxxxxxxxxx>>
       > https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api
       >




    _______________________________________________
    Dyninst-api mailing list
    Dyninst-api@xxxxxxxxxxx <mailto:Dyninst-api@xxxxxxxxxxx>
    https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api



  --
  --bw

  Bill Williams
  Paradyn Project
  bill@xxxxxxxxxxx <mailto:bill@xxxxxxxxxxx>




--
--bw

Bill Williams
Paradyn Project
bill@xxxxxxxxxxx

[← Prev in Thread] Current Thread [Next in Thread→]