Re: [DynInst_API:] Basic basic block usage


Date: Fri, 24 May 2013 11:40:34 -0500
From: Bill Williams <bill@xxxxxxxxxxx>
Subject: Re: [DynInst_API:] Basic basic block usage
On 05/23/2013 11:33 PM, Marc Brünink wrote:
Hi Bill,

thanks for your answer. I appreciate it very much.

I actually had problems, because I expected that once an edge is marked interprocedual all following edges are also interprocedual. Thus, I was a bit puzzled when I saw that the edge from <+7> to <+9> was classified as interprocedual and the basic block <+9> skipped; however, the basic block at <+73> is still included in the function even though it is reachable via basic block <+9> which was just declared to be not part of the function.

But I forgot that basic blocks might be shared.
So Dyninst creates the following two functions (I suppose; not verified):


F1	F2
x		0x00007ffff7b01890 <+0>:     cmpl   $0x0,0x2d793d(%rip)
x		0x00007ffff7b01897 <+7>:     jne    0x7ffff7b018a9 <read+25>
	x	0x00007ffff7b01899 <+9>:     mov    $0x0,%eax
	x	0x00007ffff7b0189e <+14>:    syscall
	x	0x00007ffff7b018a0 <+16>:    cmp    $0xfffffffffffff001,%rax
	x	0x00007ffff7b018a6 <+22>:    jae    0x7ffff7b018d9 <read+73>
	x	0x00007ffff7b018a8 <+24>:    retq
x		0x00007ffff7b018a9 <+25>:    sub    $0x8,%rsp
x		0x00007ffff7b018ad <+29>:    callq  0x7ffff7b1c9f0
x		0x00007ffff7b018b2 <+34>:    mov    %rax,(%rsp)
x		0x00007ffff7b018b6 <+38>:    mov    $0x0,%eax
x		0x00007ffff7b018bb <+43>:    syscall
x		0x00007ffff7b018bd <+45>:    mov    (%rsp),%rdi
x		0x00007ffff7b018c1 <+49>:    mov    %rax,%rdx
x		0x00007ffff7b018c4 <+52>:    callq  0x7ffff7b1ca50
x		0x00007ffff7b018c9 <+57>:    mov    %rdx,%rax
x		0x00007ffff7b018cc <+60>:    add    $0x8,%rsp
x		0x00007ffff7b018d0 <+64>:    cmp    $0xfffffffffffff001,%rax
x		0x00007ffff7b018d6 <+70>:    jae    0x7ffff7b018d9 <read+73>
x	x	0x00007ffff7b018d8 <+72>:    retq
x	x	0x00007ffff7b018d9 <+73>:    mov    0x2d1540(%rip),%rcx
x	x	0x00007ffff7b018e0 <+80>:    xor    %edx,%edx
x	x	0x00007ffff7b018e2 <+82>:    sub    %rax,%rdx
x	x	0x00007ffff7b018e5 <+85>:    mov    %edx,%fs:(%rcx)
x	x	0x00007ffff7b018e8 <+88>:    or     $0xffffffffffffffff,%rax
x	x	0x00007ffff7b018ec <+92>:    jmp    0x7ffff7b018d8 <read+72>


But if Dyninst shares basic blocks, I fail to see why the block at <+9> cannot be shared as well. Unless "having a single entry point" means an entry basic block cannot be shared. Is there a technical reason, why the entry basic block cannot be shared with another function? Or is it just that Dyninst first declares <+9> as an entry point of a function and then fails to realise that it is actually a shared block?

That's precisely it; blocks can be shared but entry blocks cannot be shared. I believe the below is a complete list of how we classify things in parsing, though I may be missing a corner case or two.

* The entry point of the binary is a function entry point
* Anything with a function symbol pointing to it is a function entry point
* Anything reached by a call instruction that is *not* a getpc call of some form is a function entry point; getpc calls are elided (as we need to modify them when we move code)
* Any edge targeting a function entry point is interprocedural
* Any return edge is interprocedural
* Any edge that we believe is a tail call based on stack heuristics is interprocedural * A function, then, becomes the set of blocks dominated by an entry block and reachable without using interprocedural edges

--bw

(I cc'ed the list again, because I think this might be worth archiving; compared to the previous msg which just contained a large tar)

Marc


On May 24, 2013, at 12:02 AM, Bill Williams wrote:

On 05/21/2013 08:38 PM, Marc Brünink wrote:
Output attached. If you need anything else, just let me know.

BTW: setting DYNINST_DEBUG_PARSING=1 leads to a bus error in the mutatee.

#0  0x00007f9594e94ed9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f9593b0cc10 in t_kill (pid=7054, sig=7) at ../src/RTlinux.c:94
#2  0x00007f9593b0d0e3 in DYNINSTbreakPoint () at ../src/RTlinux.c:116
#3  0x00007f9593b0e92b in DYNINST_instExitEntry (arg1=0x0) at
../src/RTcommon.c:399
#4  0x00007f9593da48b8 in DYNINSTstaticHeap_16M_anyHeap_1 () from
/usr/lib/libdyninstAPI_RT.so
#5  0x00007f9594db59f8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x0000000000000000 in ?? ()

Okay, I can explain this at least in part, and the parsing is not a bug
but it's not intuitive either.

We found a call from another function targeting 7f17226f7899 (the zero
eax/syscall block). That call causes us to treat that block and the
following return block as its own micro-function (since it's reached by
a call instruction), and all edges from read to that function as
interprocedural. This is a direct consequence of our "functions have
single entry points" abstraction, which has very nice properties for
both analysis and instrumentation, but it can produce confusing results
(as you see here).

If you're just using Dyninst for binary analysis, you may want to open
your binaries in rewriting mode (openBinary rather than attachProcess).
If you're going to work with a running process, in order to exit
cleanly, you'll want something like the following after you're done with
analysis:

do {
       process->continueExecution();
       bpatch->waitForStatusChange();
} while (!process->isTerminated());

to continue the process with Dyninst still attached to it, or

process->detach(true);

to detach and let it exit cleanly. Otherwise, the mutator won't be
present to handle various bits of instrumentation that we insert into
the mutatee by default (e.g. for exit callbacks) and the mutatee can
crash (as some of that instrumentation includes traps). If you still see
mutatee crashes under DYNINST_DEBUG_PARSING when you're cleaning up
properly, let me know and I'll see if I can get a fix under the wire for
8.1.2.

--bw


Marc


On 21/05/2013 23:19, Bill Williams wrote:
Marc--

That looks like a bug to me. Can you set the environment variable
DYNINST_DEBUG_PARSING to 1, run your test, and send me the output that
produces?

Thanks.

--bw

Bill Williams
Paradyn Project
bill@xxxxxxxxxxx

On 05/21/2013 07:12 AM, Marc Brünink wrote:
Hi,

I just started using Dyninst and have a small question regarding basic
blocks.

I have a micro test program that opens a file and reads some data from
it. I am having issues with the basic blocks of the read function.
Basically I'm missing 2 basic blocks.

Using function.getCFG()->getAllBasicBlocks(bbs) I get the following
basic blocks:

Basic Block (7f17226f7890 to 7f17226f7899) (entry: 1) (exit: 0):
          7f17226f7890    cmp [RIP + 2d793d], 0
          7f17226f7897    jnz 10 + RIP + 2
Basic Block (7f17226f78a9 to 7f17226f78b2) (entry: 0) (exit: 0):
          7f17226f78a9    sub RSP, 8
          7f17226f78ad    call 1b13e + RIP + 5
Basic Block (7f17226f78b2 to 7f17226f78c9) (entry: 0) (exit: 0):
          7f17226f78b2    mov [ESP], RAX
          7f17226f78b6    mov RAX, 0
          7f17226f78bb    syscall RCX
          7f17226f78bd    mov RDI, [ESP]
          7f17226f78c1    mov RDX, RAX
          7f17226f78c4    call 1b187 + RIP + 5
Basic Block (7f17226f78c9 to 7f17226f78d8) (entry: 0) (exit: 0):
          7f17226f78c9    mov RAX, RDX
          7f17226f78cc    add RSP, 8
          7f17226f78d0    cmp RAX, fffff001
          7f17226f78d6    jnb/jae/j 1 + RIP + 2
Basic Block (7f17226f78d8 to 7f17226f78d9) (entry: 0) (exit: 1):
          7f17226f78d8    ret near [RSP]
Basic Block (7f17226f78d9 to 7f17226f78ee) (entry: 0) (exit: 0):
          7f17226f78d9    mov RCX, [RIP + 2d1540]
          7f17226f78e0    xor RDX, RDX
          7f17226f78e2    sub RDX, RAX
          7f17226f78e5    mov [RCX], RDX
          7f17226f78e8    or RAX, ff
          7f17226f78ec    jmp ffffffffffffffea + RIP + 2


Using GDB I get this:

     0x00007ffff7b01890 <+0>:     cmpl   $0x0,0x2d793d(%rip)        #
0x7ffff7dd91d4
=> 0x00007ffff7b01897 <+7>:     jne    0x7ffff7b018a9 <read+25>
     0x00007ffff7b01899 <+9>:     mov    $0x0,%eax
     0x00007ffff7b0189e <+14>:    syscall
     0x00007ffff7b018a0 <+16>:    cmp    $0xfffffffffffff001,%rax
     0x00007ffff7b018a6 <+22>:    jae    0x7ffff7b018d9 <read+73>
     0x00007ffff7b018a8 <+24>:    retq
     0x00007ffff7b018a9 <+25>:    sub    $0x8,%rsp
     0x00007ffff7b018ad <+29>:    callq  0x7ffff7b1c9f0
     0x00007ffff7b018b2 <+34>:    mov    %rax,(%rsp)
     0x00007ffff7b018b6 <+38>:    mov    $0x0,%eax
     0x00007ffff7b018bb <+43>:    syscall
     0x00007ffff7b018bd <+45>:    mov    (%rsp),%rdi
     0x00007ffff7b018c1 <+49>:    mov    %rax,%rdx
     0x00007ffff7b018c4 <+52>:    callq  0x7ffff7b1ca50
     0x00007ffff7b018c9 <+57>:    mov    %rdx,%rax
     0x00007ffff7b018cc <+60>:    add    $0x8,%rsp
     0x00007ffff7b018d0 <+64>:    cmp    $0xfffffffffffff001,%rax
     0x00007ffff7b018d6 <+70>:    jae    0x7ffff7b018d9 <read+73>
     0x00007ffff7b018d8 <+72>:    retq
     0x00007ffff7b018d9 <+73>:    mov    0x2d1540(%rip),%rcx        #
0x7ffff7dd2e20
     0x00007ffff7b018e0 <+80>:    xor    %edx,%edx
     0x00007ffff7b018e2 <+82>:    sub    %rax,%rdx
     0x00007ffff7b018e5 <+85>:    mov    %edx,%fs:(%rcx)
     0x00007ffff7b018e8 <+88>:    or     $0xffffffffffffffff,%rax
     0x00007ffff7b018ec <+92>:    jmp    0x7ffff7b018d8 <read+72>


So I am basically missing the 2 blocks starting at 0x00007ffff7b01899
and 0x00007ffff7b018a8.

The edge between 0x00007ffff7b01890 and 0x00007ffff7b01899 is classified
as an interprocedual tail call (why?). Shouldn't the block be still part
of the function?

Marc
_______________________________________________
Dyninst-api mailing list
Dyninst-api@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api






--
--bw

Bill Williams
Paradyn Project
bill@xxxxxxxxxxx



--
--bw

Bill Williams
Paradyn Project
bill@xxxxxxxxxxx
[← Prev in Thread] Current Thread [Next in Thread→]