Re: [DynInst_API:] Basic basic block usage


Date: Wed, 29 May 2013 11:27:17 +0800
From: Marc Brünink <marc@xxxxxxxxxx>
Subject: Re: [DynInst_API:] Basic basic block usage
Is there any functionality that helps me to instrument the (non-existing) return edges of tail calls? Can Dyninst automatically de-optimise tail calls or do I have to do it manually? What is the proper way to distinguish an exit point with a tail call from one without? All solutions I can come up with seem to be pretty ugly and I suppose there should be an easier way.

BTW: If I ask for getExitBasicBlock on the CFG of function F1 it returns the blocks containing <+7> and <+72>. I basically get the same with findPoint(BPatch_exit).
However, if I call isExitBlock on the returned blocks it returns false for the one containing <+7>.

Marc



>> F1	F2
>> x		0x00007ffff7b01890 <+0>:     cmpl   $0x0,0x2d793d(%rip)
>> x		0x00007ffff7b01897 <+7>:     jne    0x7ffff7b018a9 <read+25>
>> 	x	0x00007ffff7b01899 <+9>:     mov    $0x0,%eax
>> 	x	0x00007ffff7b0189e <+14>:    syscall
>> 	x	0x00007ffff7b018a0 <+16>:    cmp    $0xfffffffffffff001,%rax
>> 	x	0x00007ffff7b018a6 <+22>:    jae    0x7ffff7b018d9 <read+73>
>> 	x	0x00007ffff7b018a8 <+24>:    retq
>> x		0x00007ffff7b018a9 <+25>:    sub    $0x8,%rsp
>> x		0x00007ffff7b018ad <+29>:    callq  0x7ffff7b1c9f0
>> x		0x00007ffff7b018b2 <+34>:    mov    %rax,(%rsp)
>> x		0x00007ffff7b018b6 <+38>:    mov    $0x0,%eax
>> x		0x00007ffff7b018bb <+43>:    syscall
>> x		0x00007ffff7b018bd <+45>:    mov    (%rsp),%rdi
>> x		0x00007ffff7b018c1 <+49>:    mov    %rax,%rdx
>> x		0x00007ffff7b018c4 <+52>:    callq  0x7ffff7b1ca50
>> x		0x00007ffff7b018c9 <+57>:    mov    %rdx,%rax
>> x		0x00007ffff7b018cc <+60>:    add    $0x8,%rsp
>> x		0x00007ffff7b018d0 <+64>:    cmp    $0xfffffffffffff001,%rax
>> x		0x00007ffff7b018d6 <+70>:    jae    0x7ffff7b018d9 <read+73>
>> x	x	0x00007ffff7b018d8 <+72>:    retq
>> x	x	0x00007ffff7b018d9 <+73>:    mov    0x2d1540(%rip),%rcx
>> x	x	0x00007ffff7b018e0 <+80>:    xor    %edx,%edx
>> x	x	0x00007ffff7b018e2 <+82>:    sub    %rax,%rdx
>> x	x	0x00007ffff7b018e5 <+85>:    mov    %edx,%fs:(%rcx)
>> x	x	0x00007ffff7b018e8 <+88>:    or     $0xffffffffffffffff,%rax
>> x	x	0x00007ffff7b018ec <+92>:    jmp    0x7ffff7b018d8 <read+72>
>> 
>> 
>> But if Dyninst shares basic blocks, I fail to see why the block at <+9> cannot be shared as well. Unless "having a single entry point" means an entry basic block cannot be shared. Is there a technical reason, why the entry basic block cannot be shared with another function? Or is it just that Dyninst first declares <+9> as an entry point of a function and then fails to realise that it is actually a shared block?
>> 
> That's precisely it; blocks can be shared but entry blocks cannot be 
> shared. I believe the below is a complete list of how we classify things 
> in parsing, though I may be missing a corner case or two.
> 
> * The entry point of the binary is a function entry point
> * Anything with a function symbol pointing to it is a function entry point
> * Anything reached by a call instruction that is *not* a getpc call of 
> some form is a function entry point; getpc calls are elided (as we need 
> to modify them when we move code)
> * Any edge targeting a function entry point is interprocedural
> * Any return edge is interprocedural
> * Any edge that we believe is a tail call based on stack heuristics is 
> interprocedural
> * A function, then, becomes the set of blocks dominated by an entry 
> block and reachable without using interprocedural edges
> 
> --bw
> 
>> (I cc'ed the list again, because I think this might be worth archiving; compared to the previous msg which just contained a large tar)
>> 
>> Marc
>> 
>> 
>> On May 24, 2013, at 12:02 AM, Bill Williams wrote:
>> 
>>> On 05/21/2013 08:38 PM, Marc Brünink wrote:
>>>> Output attached. If you need anything else, just let me know.
>>>> 
>>>> BTW: setting DYNINST_DEBUG_PARSING=1 leads to a bus error in the mutatee.
>>>> 
>>>> #0  0x00007f9594e94ed9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
>>>> #1  0x00007f9593b0cc10 in t_kill (pid=7054, sig=7) at ../src/RTlinux.c:94
>>>> #2  0x00007f9593b0d0e3 in DYNINSTbreakPoint () at ../src/RTlinux.c:116
>>>> #3  0x00007f9593b0e92b in DYNINST_instExitEntry (arg1=0x0) at
>>>> ../src/RTcommon.c:399
>>>> #4  0x00007f9593da48b8 in DYNINSTstaticHeap_16M_anyHeap_1 () from
>>>> /usr/lib/libdyninstAPI_RT.so
>>>> #5  0x00007f9594db59f8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>>> #6  0x0000000000000000 in ?? ()
>>>> 
>>> Okay, I can explain this at least in part, and the parsing is not a bug
>>> but it's not intuitive either.
>>> 
>>> We found a call from another function targeting 7f17226f7899 (the zero
>>> eax/syscall block). That call causes us to treat that block and the
>>> following return block as its own micro-function (since it's reached by
>>> a call instruction), and all edges from read to that function as
>>> interprocedural. This is a direct consequence of our "functions have
>>> single entry points" abstraction, which has very nice properties for
>>> both analysis and instrumentation, but it can produce confusing results
>>> (as you see here).
>>> 
>>> If you're just using Dyninst for binary analysis, you may want to open
>>> your binaries in rewriting mode (openBinary rather than attachProcess).
>>> If you're going to work with a running process, in order to exit
>>> cleanly, you'll want something like the following after you're done with
>>> analysis:
>>> 
>>> do {
>>>       process->continueExecution();
>>>       bpatch->waitForStatusChange();
>>> } while (!process->isTerminated());
>>> 
>>> to continue the process with Dyninst still attached to it, or
>>> 
>>> process->detach(true);
>>> 
>>> to detach and let it exit cleanly. Otherwise, the mutator won't be
>>> present to handle various bits of instrumentation that we insert into
>>> the mutatee by default (e.g. for exit callbacks) and the mutatee can
>>> crash (as some of that instrumentation includes traps). If you still see
>>> mutatee crashes under DYNINST_DEBUG_PARSING when you're cleaning up
>>> properly, let me know and I'll see if I can get a fix under the wire for
>>> 8.1.2.
>>> 
>>> --bw
>>> 
>>>> 
>>>> Marc
>>>> 
>>>> 
>>>> On 21/05/2013 23:19, Bill Williams wrote:
>>>>> Marc--
>>>>> 
>>>>> That looks like a bug to me. Can you set the environment variable
>>>>> DYNINST_DEBUG_PARSING to 1, run your test, and send me the output that
>>>>> produces?
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> --bw
>>>>> 
>>>>> Bill Williams
>>>>> Paradyn Project
>>>>> bill@xxxxxxxxxxx
>>>>> 
>>>>> On 05/21/2013 07:12 AM, Marc Brünink wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> I just started using Dyninst and have a small question regarding basic
>>>>>> blocks.
>>>>>> 
>>>>>> I have a micro test program that opens a file and reads some data from
>>>>>> it. I am having issues with the basic blocks of the read function.
>>>>>> Basically I'm missing 2 basic blocks.
>>>>>> 
>>>>>> Using function.getCFG()->getAllBasicBlocks(bbs) I get the following
>>>>>> basic blocks:
>>>>>> 
>>>>>> Basic Block (7f17226f7890 to 7f17226f7899) (entry: 1) (exit: 0):
>>>>>>          7f17226f7890    cmp [RIP + 2d793d], 0
>>>>>>          7f17226f7897    jnz 10 + RIP + 2
>>>>>> Basic Block (7f17226f78a9 to 7f17226f78b2) (entry: 0) (exit: 0):
>>>>>>          7f17226f78a9    sub RSP, 8
>>>>>>          7f17226f78ad    call 1b13e + RIP + 5
>>>>>> Basic Block (7f17226f78b2 to 7f17226f78c9) (entry: 0) (exit: 0):
>>>>>>          7f17226f78b2    mov [ESP], RAX
>>>>>>          7f17226f78b6    mov RAX, 0
>>>>>>          7f17226f78bb    syscall RCX
>>>>>>          7f17226f78bd    mov RDI, [ESP]
>>>>>>          7f17226f78c1    mov RDX, RAX
>>>>>>          7f17226f78c4    call 1b187 + RIP + 5
>>>>>> Basic Block (7f17226f78c9 to 7f17226f78d8) (entry: 0) (exit: 0):
>>>>>>          7f17226f78c9    mov RAX, RDX
>>>>>>          7f17226f78cc    add RSP, 8
>>>>>>          7f17226f78d0    cmp RAX, fffff001
>>>>>>          7f17226f78d6    jnb/jae/j 1 + RIP + 2
>>>>>> Basic Block (7f17226f78d8 to 7f17226f78d9) (entry: 0) (exit: 1):
>>>>>>          7f17226f78d8    ret near [RSP]
>>>>>> Basic Block (7f17226f78d9 to 7f17226f78ee) (entry: 0) (exit: 0):
>>>>>>          7f17226f78d9    mov RCX, [RIP + 2d1540]
>>>>>>          7f17226f78e0    xor RDX, RDX
>>>>>>          7f17226f78e2    sub RDX, RAX
>>>>>>          7f17226f78e5    mov [RCX], RDX
>>>>>>          7f17226f78e8    or RAX, ff
>>>>>>          7f17226f78ec    jmp ffffffffffffffea + RIP + 2
>>>>>> 
>>>>>> 
>>>>>> Using GDB I get this:
>>>>>> 
>>>>>>     0x00007ffff7b01890 <+0>:     cmpl   $0x0,0x2d793d(%rip)        #
>>>>>> 0x7ffff7dd91d4
>>>>>> => 0x00007ffff7b01897 <+7>:     jne    0x7ffff7b018a9 <read+25>
>>>>>>     0x00007ffff7b01899 <+9>:     mov    $0x0,%eax
>>>>>>     0x00007ffff7b0189e <+14>:    syscall
>>>>>>     0x00007ffff7b018a0 <+16>:    cmp    $0xfffffffffffff001,%rax
>>>>>>     0x00007ffff7b018a6 <+22>:    jae    0x7ffff7b018d9 <read+73>
>>>>>>     0x00007ffff7b018a8 <+24>:    retq
>>>>>>     0x00007ffff7b018a9 <+25>:    sub    $0x8,%rsp
>>>>>>     0x00007ffff7b018ad <+29>:    callq  0x7ffff7b1c9f0
>>>>>>     0x00007ffff7b018b2 <+34>:    mov    %rax,(%rsp)
>>>>>>     0x00007ffff7b018b6 <+38>:    mov    $0x0,%eax
>>>>>>     0x00007ffff7b018bb <+43>:    syscall
>>>>>>     0x00007ffff7b018bd <+45>:    mov    (%rsp),%rdi
>>>>>>     0x00007ffff7b018c1 <+49>:    mov    %rax,%rdx
>>>>>>     0x00007ffff7b018c4 <+52>:    callq  0x7ffff7b1ca50
>>>>>>     0x00007ffff7b018c9 <+57>:    mov    %rdx,%rax
>>>>>>     0x00007ffff7b018cc <+60>:    add    $0x8,%rsp
>>>>>>     0x00007ffff7b018d0 <+64>:    cmp    $0xfffffffffffff001,%rax
>>>>>>     0x00007ffff7b018d6 <+70>:    jae    0x7ffff7b018d9 <read+73>
>>>>>>     0x00007ffff7b018d8 <+72>:    retq
>>>>>>     0x00007ffff7b018d9 <+73>:    mov    0x2d1540(%rip),%rcx        #
>>>>>> 0x7ffff7dd2e20
>>>>>>     0x00007ffff7b018e0 <+80>:    xor    %edx,%edx
>>>>>>     0x00007ffff7b018e2 <+82>:    sub    %rax,%rdx
>>>>>>     0x00007ffff7b018e5 <+85>:    mov    %edx,%fs:(%rcx)
>>>>>>     0x00007ffff7b018e8 <+88>:    or     $0xffffffffffffffff,%rax
>>>>>>     0x00007ffff7b018ec <+92>:    jmp    0x7ffff7b018d8 <read+72>
>>>>>> 
>>>>>> 
>>>>>> So I am basically missing the 2 blocks starting at 0x00007ffff7b01899
>>>>>> and 0x00007ffff7b018a8.
>>>>>> 
>>>>>> The edge between 0x00007ffff7b01890 and 0x00007ffff7b01899 is classified
>>>>>> as an interprocedual tail call (why?). Shouldn't the block be still part
>>>>>> of the function?
>>>>>> 
>>>>>> Marc
>>>>>> _______________________________________________
>>>>>> Dyninst-api mailing list
>>>>>> Dyninst-api@xxxxxxxxxxx
>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> --bw
>>> 
>>> Bill Williams
>>> Paradyn Project
>>> bill@xxxxxxxxxxx
>> 
> 
> 
> -- 
> --bw
> 
> Bill Williams
> Paradyn Project
> bill@xxxxxxxxxxx

[← Prev in Thread] Current Thread [Next in Thread→]