Re: [DynInst_API:] DynInst Overhead


Date: Mon, 21 Jul 2014 22:01:04 -0400
From: Buddhika Chamith Kahawitage Don <budkahaw@xxxxxxxxxxxx>
Subject: Re: [DynInst_API:] DynInst Overhead
I tried building v8.2 branch but got the following error.

arch-x86.C:368: error: âe_cmpsd_sseâ was not declared in this scope

Really appreciate if you can (re)post the build instructions. I tried browsing the list archive. But couldn't find any specific build how-to. May be I missed it due to the message volume.

Regards
Bud



On Mon, Jul 21, 2014 at 4:14 PM, Bill Williams <bill@xxxxxxxxxxx> wrote:
On 07/21/2014 03:04 PM, Buddhika Chamith Kahawitage Don wrote:
Where can I get the sources of 8.2? I didn't see any links for 8.2 in
the site.

http://git.dyninst.org/dyninst.git, the v8.2 branch. I'd guess we're a week or so away from official, but the Linux code should be stable with high probability.

Build system has moved from autotools to CMake; I think I've spammed the list previously with a HOWTO on that but if I haven't I can do so.

--bw

Regards
Bud

Sent from my mobile.

On Jul 21, 2014 4:01 PM, "Bill Williams" <bill@xxxxxxxxxxx
<mailto:bill@xxxxxxxxxxx>> wrote:

  I just cleared the message with the full log to the list, but yes,
  there are some traps being installed (grep -C2 "ret conflict" to see
  where they're going, but there are a nontrivial number of them). The
  only SPEC benchmark that we can instrument cleanly (so not omnetpp
  or povray) that still suffers from serious trap overhead on the 8.2
  branch, AFAIK, is gcc--and that's on the order of 50%, not 50x.

  On 07/21/2014 02:55 PM, Buddhika Chamith Kahawitage Don wrote:

    My earlier mail is being held for the moderator approval. Anyway
    let me
    just paste a small snippet from the output. Hope that should be
    enough.


      ÂcreateRelocSpringboards for 400dd6
      ÂLooking for addr b7fb96 in function _init
      ÂgetRelocAddrs for orig addr 400dd6 /w/ block start 400dd6
      ÂgetRelocAddrs for orig addr 400dd6 /w/ block start 400dd6
      ÂAdding branch: 400dd6 -> 400ddb
         ÂInserting taken space 400dd6 -> 400ddb /w/ range 0
      ÂGenerated springboard branch 400dd1->b7fafe
      ÂConflict called for 400dd1->400dd6
         Âlooking for 400dd1
           ÂFound 400dd1 -> 400dd6 /w/ state 1e
         ÂNo conflict, we're good
      ÂcreateRelocSpringboards for 400dd1
      ÂLooking for addr b7fafe in function _init
      ÂgetRelocAddrs for orig addr 400dd1 /w/ block start 400dd1
      ÂgetRelocAddrs for orig addr 400dd1 /w/ block start 400dd1
      ÂAdding branch: 400dd1 -> 400dd6
         ÂInserting taken space 400dd1 -> 400dd6 /w/ range 0
      ÂInstalling 15980 springboards!




    On Mon, Jul 21, 2014 at 3:41 PM, Buddhika Chamith Kahawitage Don
    <budkahaw@xxxxxxxxxxxx <mailto:budkahaw@xxxxxxxxxxxx>
    <mailto:budkahaw@xxxxxxxxxxxx <mailto:budkahaw@xxxxxxxxxxxx>>__>

    wrote:

      ÂPlease find the output in attached file.

      ÂRegards
      ÂBud


      ÂOn Mon, Jul 21, 2014 at 3:13 PM, Bill Williams
    <bill@xxxxxxxxxxx <mailto:bill@xxxxxxxxxxx>
      Â<mailto:bill@xxxxxxxxxxx <mailto:bill@xxxxxxxxxxx>>> wrote:

        ÂOn 07/21/2014 01:59 PM, Buddhika Chamith Kahawitage Don
    wrote:

          ÂPlease find my responses inline.

          ÂOn Mon, Jul 21, 2014 at 1:48 PM, Bill Williams
          Â<bill@xxxxxxxxxxx <mailto:bill@xxxxxxxxxxx>
    <mailto:bill@xxxxxxxxxxx <mailto:bill@xxxxxxxxxxx>>
          Â<mailto:bill@xxxxxxxxxxx <mailto:bill@xxxxxxxxxxx>
    <mailto:bill@xxxxxxxxxxx <mailto:bill@xxxxxxxxxxx>>>> wrote:

             On 07/21/2014 11:52 AM, Matthew LeGendre wrote:


               Presumably you're running the CodeCoverage
    tool in
          Âtwo steps: 1)
               Rewriting the binary 2) Running the rewritten
          Âbinary. ÂAll of the
               analysis/rewriting overheads are in step
    1, and the
          Âinstrumentation
               overhead can be measured just by timing
    step 2.


          ÂThat's true.


               If you're getting 50x overhead on just
    step 2 then
          Âsomething's very
               wrong. I've got my own codeCoverage tool
    (which I
          Âunfortunately
               can't
               share yet) and I only see 10% overhead.

             Hrm. If this is with a prebuilt, statically linked
          Âbinary and not
             with a build from source against current
    Dyninst, we
          Âmay also be
             hitting traps in an inner loop. That's more
    the right
          Âorder of
             magnitude than trampguards would
    be--trampguards would
          Âbe in the
             1.5-5x sort of neighborhood off the top of my
    head.


          ÂIn fact that was the use case I had in my mind. But
    I was
          Âjust checking
          Âthe static rewriting case first up since it was readily
          Âavailable with
          Âcode-coverage tool.


        ÂSorry, I meant a statically linked version of the
    CodeCoverage
        Âtool; apologies for the confusion.


             The source for CodeCoverage (which you can build
          Âagainst the latest
             Dyninst and be reasonably sure of *not*
    hitting traps
          Âin almost all
             of SPEC) is in our tools.git repository. I
    know we've
          Âfixed some
             performance regressions that turned up between
    the AWAT
          Âpaper and 8.1.2.


          ÂI am using dyninst 8.1.2 which I built from source.

        ÂThen yes, it's probably trap overhead, and 8.2 should
    fix it--I
        Âbelieve h264 was on the list of benchmarks that had a
        Âperformance regression that we've fixed for the current
    release.

        ÂIf you set DYNINST_DEBUG_SPRINGBOARD=1 in your
    environment and
        Âsend me the output of the rewriting pass with that
    enabled, I'll
        Âbe able to confirm the cause (and status) of this problem.



               Just an educated guess--I frequently see
    big overheads
               associated with
               trampoline guards. ÂDyninst should have
    realized
          Âtrampoline
               guards are
               unnecessary for codeCoverage and not
    emited them.
           ÂBut if
               something went
               wrong you can manually turn them off by
    putting a
          Âcall to:

                 bpatch.setTrampRecursive(true)______;




          ÂTried it without any success :(


               Near the top of codeCoverage.C's main()
    function.
           ÂIf that makes a
               difference then let the list know. ÂThat
    implies
          Âthere's a bug that
               should be investigated.


          ÂAny ideas on how to debug this?

          ÂThanks
          ÂBud



        Â--
        Â--bw

        ÂBill Williams
        ÂParadyn Project
    bill@xxxxxxxxxxx <mailto:bill@xxxxxxxxxxx>
    <mailto:bill@xxxxxxxxxxx <mailto:bill@xxxxxxxxxxx>>






  --
  --bw

  Bill Williams
  Paradyn Project
  bill@xxxxxxxxxxx <mailto:bill@xxxxxxxxxxx>



--
--bw

Bill Williams
Paradyn Project
bill@xxxxxxxxxxx

[← Prev in Thread] Current Thread [Next in Thread→]