Re: [DynInst_API:] DynInst Overhead


Date: Mon, 21 Jul 2014 15:14:53 -0500
From: Bill Williams <bill@xxxxxxxxxxx>
Subject: Re: [DynInst_API:] DynInst Overhead
On 07/21/2014 03:04 PM, Buddhika Chamith Kahawitage Don wrote:
Where can I get the sources of 8.2? I didn't see any links for 8.2 in
the site.

http://git.dyninst.org/dyninst.git, the v8.2 branch. I'd guess we're a week or so away from official, but the Linux code should be stable with high probability.

Build system has moved from autotools to CMake; I think I've spammed the list previously with a HOWTO on that but if I haven't I can do so.

--bw

Regards
Bud

Sent from my mobile.

On Jul 21, 2014 4:01 PM, "Bill Williams" <bill@xxxxxxxxxxx
<mailto:bill@xxxxxxxxxxx>> wrote:

    I just cleared the message with the full log to the list, but yes,
    there are some traps being installed (grep -C2 "ret conflict" to see
    where they're going, but there are a nontrivial number of them). The
    only SPEC benchmark that we can instrument cleanly (so not omnetpp
    or povray) that still suffers from serious trap overhead on the 8.2
    branch, AFAIK, is gcc--and that's on the order of 50%, not 50x.

    On 07/21/2014 02:55 PM, Buddhika Chamith Kahawitage Don wrote:

        My earlier mail is being held for the moderator approval. Anyway
        let me
        just paste a small snippet from the output. Hope that should be
        enough.


             createRelocSpringboards for 400dd6
             Looking for addr b7fb96 in function _init
             getRelocAddrs for orig addr 400dd6 /w/ block start 400dd6
             getRelocAddrs for orig addr 400dd6 /w/ block start 400dd6
             Adding branch: 400dd6 -> 400ddb
                   Inserting taken space 400dd6 -> 400ddb /w/ range 0
             Generated springboard branch 400dd1->b7fafe
             Conflict called for 400dd1->400dd6
                   looking for 400dd1
                       Found 400dd1 -> 400dd6 /w/ state 1e
                   No conflict, we're good
             createRelocSpringboards for 400dd1
             Looking for addr b7fafe in function _init
             getRelocAddrs for orig addr 400dd1 /w/ block start 400dd1
             getRelocAddrs for orig addr 400dd1 /w/ block start 400dd1
             Adding branch: 400dd1 -> 400dd6
                   Inserting taken space 400dd1 -> 400dd6 /w/ range 0
             Installing 15980 springboards!




        On Mon, Jul 21, 2014 at 3:41 PM, Buddhika Chamith Kahawitage Don
        <budkahaw@xxxxxxxxxxxx <mailto:budkahaw@xxxxxxxxxxxx>
        <mailto:budkahaw@xxxxxxxxxxxx <mailto:budkahaw@xxxxxxxxxxxx>>__>
        wrote:

             Please find the output in attached file.

             Regards
             Bud


             On Mon, Jul 21, 2014 at 3:13 PM, Bill Williams
        <bill@xxxxxxxxxxx <mailto:bill@xxxxxxxxxxx>
             <mailto:bill@xxxxxxxxxxx <mailto:bill@xxxxxxxxxxx>>> wrote:

                 On 07/21/2014 01:59 PM, Buddhika Chamith Kahawitage Don
        wrote:

                     Please find my responses inline.

                     On Mon, Jul 21, 2014 at 1:48 PM, Bill Williams
                     <bill@xxxxxxxxxxx <mailto:bill@xxxxxxxxxxx>
        <mailto:bill@xxxxxxxxxxx <mailto:bill@xxxxxxxxxxx>>
                     <mailto:bill@xxxxxxxxxxx <mailto:bill@xxxxxxxxxxx>
        <mailto:bill@xxxxxxxxxxx <mailto:bill@xxxxxxxxxxx>>>> wrote:

                          On 07/21/2014 11:52 AM, Matthew LeGendre wrote:


                              Presumably you're running the CodeCoverage
        tool in
                     two steps: 1)
                              Rewriting the binary 2) Running the rewritten
                     binary.  All of the
                              analysis/rewriting overheads are in step
        1, and the
                     instrumentation
                              overhead can be measured just by timing
        step 2.


                     That's true.


                              If you're getting 50x overhead on just
        step 2 then
                     something's very
                              wrong. I've got my own codeCoverage tool
        (which I
                     unfortunately
                              can't
                              share yet) and I only see 10% overhead.

                          Hrm. If this is with a prebuilt, statically linked
                     binary and not
                          with a build from source against current
        Dyninst, we
                     may also be
                          hitting traps in an inner loop. That's more
        the right
                     order of
                          magnitude than trampguards would
        be--trampguards would
                     be in the
                          1.5-5x sort of neighborhood off the top of my
        head.


                     In fact that was the use case I had in my mind. But
        I was
                     just checking
                     the static rewriting case first up since it was readily
                     available with
                     code-coverage tool.


                 Sorry, I meant a statically linked version of the
        CodeCoverage
                 tool; apologies for the confusion.


                          The source for CodeCoverage (which you can build
                     against the latest
                          Dyninst and be reasonably sure of *not*
        hitting traps
                     in almost all
                          of SPEC) is in our tools.git repository. I
        know we've
                     fixed some
                          performance regressions that turned up between
        the AWAT
                     paper and 8.1.2.


                     I am using dyninst 8.1.2 which I built from source.

                 Then yes, it's probably trap overhead, and 8.2 should
        fix it--I
                 believe h264 was on the list of benchmarks that had a
                 performance regression that we've fixed for the current
        release.

                 If you set DYNINST_DEBUG_SPRINGBOARD=1 in your
        environment and
                 send me the output of the rewriting pass with that
        enabled, I'll
                 be able to confirm the cause (and status) of this problem.



                              Just an educated guess--I frequently see
        big overheads
                              associated with
                              trampoline guards.  Dyninst should have
        realized
                     trampoline
                              guards are
                              unnecessary for codeCoverage and not
        emited them.
                       But if
                              something went
                              wrong you can manually turn them off by
        putting a
                     call to:

                                  bpatch.setTrampRecursive(true)______;



                     Tried it without any success :(


                              Near the top of codeCoverage.C's main()
        function.
                       If that makes a
                              difference then let the list know.  That
        implies
                     there's a bug that
                              should be investigated.


                     Any ideas on how to debug this?

                     Thanks
                     Bud



                 --
                 --bw

                 Bill Williams
                 Paradyn Project
        bill@xxxxxxxxxxx <mailto:bill@xxxxxxxxxxx>
        <mailto:bill@xxxxxxxxxxx <mailto:bill@xxxxxxxxxxx>>





    --
    --bw

    Bill Williams
    Paradyn Project
    bill@xxxxxxxxxxx <mailto:bill@xxxxxxxxxxx>



--
--bw

Bill Williams
Paradyn Project
bill@xxxxxxxxxxx
[← Prev in Thread] Current Thread [Next in Thread→]