Re: [DynInst_API:] Basic basic block usage


Date: Fri, 24 May 2013 12:33:04 +0800
From: Marc Brünink <marc@xxxxxxxxxx>
Subject: Re: [DynInst_API:] Basic basic block usage
Hi Bill,

thanks for your answer. I appreciate it very much.

I actually had problems, because I expected that once an edge is marked interprocedual all following edges are also interprocedual. Thus, I was a bit puzzled when I saw that the edge from <+7> to <+9> was classified as interprocedual and the basic block <+9> skipped; however, the basic block at <+73> is still included in the function even though it is reachable via basic block <+9> which was just declared to be not part of the function. 

But I forgot that basic blocks might be shared.
So Dyninst creates the following two functions (I suppose; not verified):

 
F1	F2
x		0x00007ffff7b01890 <+0>:     cmpl   $0x0,0x2d793d(%rip)
x		0x00007ffff7b01897 <+7>:     jne    0x7ffff7b018a9 <read+25>
	x	0x00007ffff7b01899 <+9>:     mov    $0x0,%eax
	x	0x00007ffff7b0189e <+14>:    syscall
	x	0x00007ffff7b018a0 <+16>:    cmp    $0xfffffffffffff001,%rax
	x	0x00007ffff7b018a6 <+22>:    jae    0x7ffff7b018d9 <read+73>
	x	0x00007ffff7b018a8 <+24>:    retq
x		0x00007ffff7b018a9 <+25>:    sub    $0x8,%rsp
x		0x00007ffff7b018ad <+29>:    callq  0x7ffff7b1c9f0
x		0x00007ffff7b018b2 <+34>:    mov    %rax,(%rsp)
x		0x00007ffff7b018b6 <+38>:    mov    $0x0,%eax
x		0x00007ffff7b018bb <+43>:    syscall
x		0x00007ffff7b018bd <+45>:    mov    (%rsp),%rdi
x		0x00007ffff7b018c1 <+49>:    mov    %rax,%rdx
x		0x00007ffff7b018c4 <+52>:    callq  0x7ffff7b1ca50
x		0x00007ffff7b018c9 <+57>:    mov    %rdx,%rax
x		0x00007ffff7b018cc <+60>:    add    $0x8,%rsp
x		0x00007ffff7b018d0 <+64>:    cmp    $0xfffffffffffff001,%rax
x		0x00007ffff7b018d6 <+70>:    jae    0x7ffff7b018d9 <read+73>
x	x	0x00007ffff7b018d8 <+72>:    retq
x	x	0x00007ffff7b018d9 <+73>:    mov    0x2d1540(%rip),%rcx
x	x	0x00007ffff7b018e0 <+80>:    xor    %edx,%edx
x	x	0x00007ffff7b018e2 <+82>:    sub    %rax,%rdx
x	x	0x00007ffff7b018e5 <+85>:    mov    %edx,%fs:(%rcx)
x	x	0x00007ffff7b018e8 <+88>:    or     $0xffffffffffffffff,%rax
x	x	0x00007ffff7b018ec <+92>:    jmp    0x7ffff7b018d8 <read+72>


But if Dyninst shares basic blocks, I fail to see why the block at <+9> cannot be shared as well. Unless "having a single entry point" means an entry basic block cannot be shared. Is there a technical reason, why the entry basic block cannot be shared with another function? Or is it just that Dyninst first declares <+9> as an entry point of a function and then fails to realise that it is actually a shared block?

(I cc'ed the list again, because I think this might be worth archiving; compared to the previous msg which just contained a large tar)

Marc


On May 24, 2013, at 12:02 AM, Bill Williams wrote:

> On 05/21/2013 08:38 PM, Marc Brünink wrote:
>> Output attached. If you need anything else, just let me know.
>> 
>> BTW: setting DYNINST_DEBUG_PARSING=1 leads to a bus error in the mutatee.
>> 
>> #0  0x00007f9594e94ed9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x00007f9593b0cc10 in t_kill (pid=7054, sig=7) at ../src/RTlinux.c:94
>> #2  0x00007f9593b0d0e3 in DYNINSTbreakPoint () at ../src/RTlinux.c:116
>> #3  0x00007f9593b0e92b in DYNINST_instExitEntry (arg1=0x0) at
>> ../src/RTcommon.c:399
>> #4  0x00007f9593da48b8 in DYNINSTstaticHeap_16M_anyHeap_1 () from
>> /usr/lib/libdyninstAPI_RT.so
>> #5  0x00007f9594db59f8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>> 
> Okay, I can explain this at least in part, and the parsing is not a bug 
> but it's not intuitive either.
> 
> We found a call from another function targeting 7f17226f7899 (the zero 
> eax/syscall block). That call causes us to treat that block and the 
> following return block as its own micro-function (since it's reached by 
> a call instruction), and all edges from read to that function as 
> interprocedural. This is a direct consequence of our "functions have 
> single entry points" abstraction, which has very nice properties for 
> both analysis and instrumentation, but it can produce confusing results 
> (as you see here).
> 
> If you're just using Dyninst for binary analysis, you may want to open 
> your binaries in rewriting mode (openBinary rather than attachProcess). 
> If you're going to work with a running process, in order to exit 
> cleanly, you'll want something like the following after you're done with 
> analysis:
> 
> do {
>       process->continueExecution();
>       bpatch->waitForStatusChange();
> } while (!process->isTerminated());
> 
> to continue the process with Dyninst still attached to it, or
> 
> process->detach(true);
> 
> to detach and let it exit cleanly. Otherwise, the mutator won't be 
> present to handle various bits of instrumentation that we insert into 
> the mutatee by default (e.g. for exit callbacks) and the mutatee can 
> crash (as some of that instrumentation includes traps). If you still see 
> mutatee crashes under DYNINST_DEBUG_PARSING when you're cleaning up 
> properly, let me know and I'll see if I can get a fix under the wire for 
> 8.1.2.
> 
> --bw
> 
>> 
>> Marc
>> 
>> 
>> On 21/05/2013 23:19, Bill Williams wrote:
>>> Marc--
>>> 
>>> That looks like a bug to me. Can you set the environment variable
>>> DYNINST_DEBUG_PARSING to 1, run your test, and send me the output that
>>> produces?
>>> 
>>> Thanks.
>>> 
>>> --bw
>>> 
>>> Bill Williams
>>> Paradyn Project
>>> bill@xxxxxxxxxxx
>>> 
>>> On 05/21/2013 07:12 AM, Marc Brünink wrote:
>>>> Hi,
>>>> 
>>>> I just started using Dyninst and have a small question regarding basic
>>>> blocks.
>>>> 
>>>> I have a micro test program that opens a file and reads some data from
>>>> it. I am having issues with the basic blocks of the read function.
>>>> Basically I'm missing 2 basic blocks.
>>>> 
>>>> Using function.getCFG()->getAllBasicBlocks(bbs) I get the following
>>>> basic blocks:
>>>> 
>>>> Basic Block (7f17226f7890 to 7f17226f7899) (entry: 1) (exit: 0):
>>>>          7f17226f7890    cmp [RIP + 2d793d], 0
>>>>          7f17226f7897    jnz 10 + RIP + 2
>>>> Basic Block (7f17226f78a9 to 7f17226f78b2) (entry: 0) (exit: 0):
>>>>          7f17226f78a9    sub RSP, 8
>>>>          7f17226f78ad    call 1b13e + RIP + 5
>>>> Basic Block (7f17226f78b2 to 7f17226f78c9) (entry: 0) (exit: 0):
>>>>          7f17226f78b2    mov [ESP], RAX
>>>>          7f17226f78b6    mov RAX, 0
>>>>          7f17226f78bb    syscall RCX
>>>>          7f17226f78bd    mov RDI, [ESP]
>>>>          7f17226f78c1    mov RDX, RAX
>>>>          7f17226f78c4    call 1b187 + RIP + 5
>>>> Basic Block (7f17226f78c9 to 7f17226f78d8) (entry: 0) (exit: 0):
>>>>          7f17226f78c9    mov RAX, RDX
>>>>          7f17226f78cc    add RSP, 8
>>>>          7f17226f78d0    cmp RAX, fffff001
>>>>          7f17226f78d6    jnb/jae/j 1 + RIP + 2
>>>> Basic Block (7f17226f78d8 to 7f17226f78d9) (entry: 0) (exit: 1):
>>>>          7f17226f78d8    ret near [RSP]
>>>> Basic Block (7f17226f78d9 to 7f17226f78ee) (entry: 0) (exit: 0):
>>>>          7f17226f78d9    mov RCX, [RIP + 2d1540]
>>>>          7f17226f78e0    xor RDX, RDX
>>>>          7f17226f78e2    sub RDX, RAX
>>>>          7f17226f78e5    mov [RCX], RDX
>>>>          7f17226f78e8    or RAX, ff
>>>>          7f17226f78ec    jmp ffffffffffffffea + RIP + 2
>>>> 
>>>> 
>>>> Using GDB I get this:
>>>> 
>>>>     0x00007ffff7b01890 <+0>:     cmpl   $0x0,0x2d793d(%rip)        #
>>>> 0x7ffff7dd91d4
>>>> => 0x00007ffff7b01897 <+7>:     jne    0x7ffff7b018a9 <read+25>
>>>>     0x00007ffff7b01899 <+9>:     mov    $0x0,%eax
>>>>     0x00007ffff7b0189e <+14>:    syscall
>>>>     0x00007ffff7b018a0 <+16>:    cmp    $0xfffffffffffff001,%rax
>>>>     0x00007ffff7b018a6 <+22>:    jae    0x7ffff7b018d9 <read+73>
>>>>     0x00007ffff7b018a8 <+24>:    retq
>>>>     0x00007ffff7b018a9 <+25>:    sub    $0x8,%rsp
>>>>     0x00007ffff7b018ad <+29>:    callq  0x7ffff7b1c9f0
>>>>     0x00007ffff7b018b2 <+34>:    mov    %rax,(%rsp)
>>>>     0x00007ffff7b018b6 <+38>:    mov    $0x0,%eax
>>>>     0x00007ffff7b018bb <+43>:    syscall
>>>>     0x00007ffff7b018bd <+45>:    mov    (%rsp),%rdi
>>>>     0x00007ffff7b018c1 <+49>:    mov    %rax,%rdx
>>>>     0x00007ffff7b018c4 <+52>:    callq  0x7ffff7b1ca50
>>>>     0x00007ffff7b018c9 <+57>:    mov    %rdx,%rax
>>>>     0x00007ffff7b018cc <+60>:    add    $0x8,%rsp
>>>>     0x00007ffff7b018d0 <+64>:    cmp    $0xfffffffffffff001,%rax
>>>>     0x00007ffff7b018d6 <+70>:    jae    0x7ffff7b018d9 <read+73>
>>>>     0x00007ffff7b018d8 <+72>:    retq
>>>>     0x00007ffff7b018d9 <+73>:    mov    0x2d1540(%rip),%rcx        #
>>>> 0x7ffff7dd2e20
>>>>     0x00007ffff7b018e0 <+80>:    xor    %edx,%edx
>>>>     0x00007ffff7b018e2 <+82>:    sub    %rax,%rdx
>>>>     0x00007ffff7b018e5 <+85>:    mov    %edx,%fs:(%rcx)
>>>>     0x00007ffff7b018e8 <+88>:    or     $0xffffffffffffffff,%rax
>>>>     0x00007ffff7b018ec <+92>:    jmp    0x7ffff7b018d8 <read+72>
>>>> 
>>>> 
>>>> So I am basically missing the 2 blocks starting at 0x00007ffff7b01899
>>>> and 0x00007ffff7b018a8.
>>>> 
>>>> The edge between 0x00007ffff7b01890 and 0x00007ffff7b01899 is classified
>>>> as an interprocedual tail call (why?). Shouldn't the block be still part
>>>> of the function?
>>>> 
>>>> Marc
>>>> _______________________________________________
>>>> Dyninst-api mailing list
>>>> Dyninst-api@xxxxxxxxxxx
>>>> https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api
>>> 
>>> 
>>> 
>> 
> 
> 
> -- 
> --bw
> 
> Bill Williams
> Paradyn Project
> bill@xxxxxxxxxxx

[← Prev in Thread] Current Thread [Next in Thread→]