Is there any functionality that helps me to instrument the (non-existing) return edges of tail calls? Can Dyninst automatically de-optimise tail calls or do I have to do it manually? What is the proper way to distinguish an exit point with a tail call from one without? All solutions I can come up with seem to be pretty ugly and I suppose there should be an easier way.
BTW: If I ask for getExitBasicBlock on the CFG of function F1 it returns the blocks containing <+7> and <+72>. I basically get the same with findPoint(BPatch_exit).
However, if I call isExitBlock on the returned blocks it returns false for the one containing <+7>.
Marc
>> F1 F2
>> x 0x00007ffff7b01890 <+0>: cmpl $0x0,0x2d793d(%rip)
>> x 0x00007ffff7b01897 <+7>: jne 0x7ffff7b018a9 <read+25>
>> x 0x00007ffff7b01899 <+9>: mov $0x0,%eax
>> x 0x00007ffff7b0189e <+14>: syscall
>> x 0x00007ffff7b018a0 <+16>: cmp $0xfffffffffffff001,%rax
>> x 0x00007ffff7b018a6 <+22>: jae 0x7ffff7b018d9 <read+73>
>> x 0x00007ffff7b018a8 <+24>: retq
>> x 0x00007ffff7b018a9 <+25>: sub $0x8,%rsp
>> x 0x00007ffff7b018ad <+29>: callq 0x7ffff7b1c9f0
>> x 0x00007ffff7b018b2 <+34>: mov %rax,(%rsp)
>> x 0x00007ffff7b018b6 <+38>: mov $0x0,%eax
>> x 0x00007ffff7b018bb <+43>: syscall
>> x 0x00007ffff7b018bd <+45>: mov (%rsp),%rdi
>> x 0x00007ffff7b018c1 <+49>: mov %rax,%rdx
>> x 0x00007ffff7b018c4 <+52>: callq 0x7ffff7b1ca50
>> x 0x00007ffff7b018c9 <+57>: mov %rdx,%rax
>> x 0x00007ffff7b018cc <+60>: add $0x8,%rsp
>> x 0x00007ffff7b018d0 <+64>: cmp $0xfffffffffffff001,%rax
>> x 0x00007ffff7b018d6 <+70>: jae 0x7ffff7b018d9 <read+73>
>> x x 0x00007ffff7b018d8 <+72>: retq
>> x x 0x00007ffff7b018d9 <+73>: mov 0x2d1540(%rip),%rcx
>> x x 0x00007ffff7b018e0 <+80>: xor %edx,%edx
>> x x 0x00007ffff7b018e2 <+82>: sub %rax,%rdx
>> x x 0x00007ffff7b018e5 <+85>: mov %edx,%fs:(%rcx)
>> x x 0x00007ffff7b018e8 <+88>: or $0xffffffffffffffff,%rax
>> x x 0x00007ffff7b018ec <+92>: jmp 0x7ffff7b018d8 <read+72>
>>
>>
>> But if Dyninst shares basic blocks, I fail to see why the block at <+9> cannot be shared as well. Unless "having a single entry point" means an entry basic block cannot be shared. Is there a technical reason, why the entry basic block cannot be shared with another function? Or is it just that Dyninst first declares <+9> as an entry point of a function and then fails to realise that it is actually a shared block?
>>
> That's precisely it; blocks can be shared but entry blocks cannot be
> shared. I believe the below is a complete list of how we classify things
> in parsing, though I may be missing a corner case or two.
>
> * The entry point of the binary is a function entry point
> * Anything with a function symbol pointing to it is a function entry point
> * Anything reached by a call instruction that is *not* a getpc call of
> some form is a function entry point; getpc calls are elided (as we need
> to modify them when we move code)
> * Any edge targeting a function entry point is interprocedural
> * Any return edge is interprocedural
> * Any edge that we believe is a tail call based on stack heuristics is
> interprocedural
> * A function, then, becomes the set of blocks dominated by an entry
> block and reachable without using interprocedural edges
>
> --bw
>
>> (I cc'ed the list again, because I think this might be worth archiving; compared to the previous msg which just contained a large tar)
>>
>> Marc
>>
>>
>> On May 24, 2013, at 12:02 AM, Bill Williams wrote:
>>
>>> On 05/21/2013 08:38 PM, Marc Brünink wrote:
>>>> Output attached. If you need anything else, just let me know.
>>>>
>>>> BTW: setting DYNINST_DEBUG_PARSING=1 leads to a bus error in the mutatee.
>>>>
>>>> #0 0x00007f9594e94ed9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
>>>> #1 0x00007f9593b0cc10 in t_kill (pid=7054, sig=7) at ../src/RTlinux.c:94
>>>> #2 0x00007f9593b0d0e3 in DYNINSTbreakPoint () at ../src/RTlinux.c:116
>>>> #3 0x00007f9593b0e92b in DYNINST_instExitEntry (arg1=0x0) at
>>>> ../src/RTcommon.c:399
>>>> #4 0x00007f9593da48b8 in DYNINSTstaticHeap_16M_anyHeap_1 () from
>>>> /usr/lib/libdyninstAPI_RT.so
>>>> #5 0x00007f9594db59f8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>>> #6 0x0000000000000000 in ?? ()
>>>>
>>> Okay, I can explain this at least in part, and the parsing is not a bug
>>> but it's not intuitive either.
>>>
>>> We found a call from another function targeting 7f17226f7899 (the zero
>>> eax/syscall block). That call causes us to treat that block and the
>>> following return block as its own micro-function (since it's reached by
>>> a call instruction), and all edges from read to that function as
>>> interprocedural. This is a direct consequence of our "functions have
>>> single entry points" abstraction, which has very nice properties for
>>> both analysis and instrumentation, but it can produce confusing results
>>> (as you see here).
>>>
>>> If you're just using Dyninst for binary analysis, you may want to open
>>> your binaries in rewriting mode (openBinary rather than attachProcess).
>>> If you're going to work with a running process, in order to exit
>>> cleanly, you'll want something like the following after you're done with
>>> analysis:
>>>
>>> do {
>>> process->continueExecution();
>>> bpatch->waitForStatusChange();
>>> } while (!process->isTerminated());
>>>
>>> to continue the process with Dyninst still attached to it, or
>>>
>>> process->detach(true);
>>>
>>> to detach and let it exit cleanly. Otherwise, the mutator won't be
>>> present to handle various bits of instrumentation that we insert into
>>> the mutatee by default (e.g. for exit callbacks) and the mutatee can
>>> crash (as some of that instrumentation includes traps). If you still see
>>> mutatee crashes under DYNINST_DEBUG_PARSING when you're cleaning up
>>> properly, let me know and I'll see if I can get a fix under the wire for
>>> 8.1.2.
>>>
>>> --bw
>>>
>>>>
>>>> Marc
>>>>
>>>>
>>>> On 21/05/2013 23:19, Bill Williams wrote:
>>>>> Marc--
>>>>>
>>>>> That looks like a bug to me. Can you set the environment variable
>>>>> DYNINST_DEBUG_PARSING to 1, run your test, and send me the output that
>>>>> produces?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> --bw
>>>>>
>>>>> Bill Williams
>>>>> Paradyn Project
>>>>> bill@xxxxxxxxxxx
>>>>>
>>>>> On 05/21/2013 07:12 AM, Marc Brünink wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I just started using Dyninst and have a small question regarding basic
>>>>>> blocks.
>>>>>>
>>>>>> I have a micro test program that opens a file and reads some data from
>>>>>> it. I am having issues with the basic blocks of the read function.
>>>>>> Basically I'm missing 2 basic blocks.
>>>>>>
>>>>>> Using function.getCFG()->getAllBasicBlocks(bbs) I get the following
>>>>>> basic blocks:
>>>>>>
>>>>>> Basic Block (7f17226f7890 to 7f17226f7899) (entry: 1) (exit: 0):
>>>>>> 7f17226f7890 cmp [RIP + 2d793d], 0
>>>>>> 7f17226f7897 jnz 10 + RIP + 2
>>>>>> Basic Block (7f17226f78a9 to 7f17226f78b2) (entry: 0) (exit: 0):
>>>>>> 7f17226f78a9 sub RSP, 8
>>>>>> 7f17226f78ad call 1b13e + RIP + 5
>>>>>> Basic Block (7f17226f78b2 to 7f17226f78c9) (entry: 0) (exit: 0):
>>>>>> 7f17226f78b2 mov [ESP], RAX
>>>>>> 7f17226f78b6 mov RAX, 0
>>>>>> 7f17226f78bb syscall RCX
>>>>>> 7f17226f78bd mov RDI, [ESP]
>>>>>> 7f17226f78c1 mov RDX, RAX
>>>>>> 7f17226f78c4 call 1b187 + RIP + 5
>>>>>> Basic Block (7f17226f78c9 to 7f17226f78d8) (entry: 0) (exit: 0):
>>>>>> 7f17226f78c9 mov RAX, RDX
>>>>>> 7f17226f78cc add RSP, 8
>>>>>> 7f17226f78d0 cmp RAX, fffff001
>>>>>> 7f17226f78d6 jnb/jae/j 1 + RIP + 2
>>>>>> Basic Block (7f17226f78d8 to 7f17226f78d9) (entry: 0) (exit: 1):
>>>>>> 7f17226f78d8 ret near [RSP]
>>>>>> Basic Block (7f17226f78d9 to 7f17226f78ee) (entry: 0) (exit: 0):
>>>>>> 7f17226f78d9 mov RCX, [RIP + 2d1540]
>>>>>> 7f17226f78e0 xor RDX, RDX
>>>>>> 7f17226f78e2 sub RDX, RAX
>>>>>> 7f17226f78e5 mov [RCX], RDX
>>>>>> 7f17226f78e8 or RAX, ff
>>>>>> 7f17226f78ec jmp ffffffffffffffea + RIP + 2
>>>>>>
>>>>>>
>>>>>> Using GDB I get this:
>>>>>>
>>>>>> 0x00007ffff7b01890 <+0>: cmpl $0x0,0x2d793d(%rip) #
>>>>>> 0x7ffff7dd91d4
>>>>>> => 0x00007ffff7b01897 <+7>: jne 0x7ffff7b018a9 <read+25>
>>>>>> 0x00007ffff7b01899 <+9>: mov $0x0,%eax
>>>>>> 0x00007ffff7b0189e <+14>: syscall
>>>>>> 0x00007ffff7b018a0 <+16>: cmp $0xfffffffffffff001,%rax
>>>>>> 0x00007ffff7b018a6 <+22>: jae 0x7ffff7b018d9 <read+73>
>>>>>> 0x00007ffff7b018a8 <+24>: retq
>>>>>> 0x00007ffff7b018a9 <+25>: sub $0x8,%rsp
>>>>>> 0x00007ffff7b018ad <+29>: callq 0x7ffff7b1c9f0
>>>>>> 0x00007ffff7b018b2 <+34>: mov %rax,(%rsp)
>>>>>> 0x00007ffff7b018b6 <+38>: mov $0x0,%eax
>>>>>> 0x00007ffff7b018bb <+43>: syscall
>>>>>> 0x00007ffff7b018bd <+45>: mov (%rsp),%rdi
>>>>>> 0x00007ffff7b018c1 <+49>: mov %rax,%rdx
>>>>>> 0x00007ffff7b018c4 <+52>: callq 0x7ffff7b1ca50
>>>>>> 0x00007ffff7b018c9 <+57>: mov %rdx,%rax
>>>>>> 0x00007ffff7b018cc <+60>: add $0x8,%rsp
>>>>>> 0x00007ffff7b018d0 <+64>: cmp $0xfffffffffffff001,%rax
>>>>>> 0x00007ffff7b018d6 <+70>: jae 0x7ffff7b018d9 <read+73>
>>>>>> 0x00007ffff7b018d8 <+72>: retq
>>>>>> 0x00007ffff7b018d9 <+73>: mov 0x2d1540(%rip),%rcx #
>>>>>> 0x7ffff7dd2e20
>>>>>> 0x00007ffff7b018e0 <+80>: xor %edx,%edx
>>>>>> 0x00007ffff7b018e2 <+82>: sub %rax,%rdx
>>>>>> 0x00007ffff7b018e5 <+85>: mov %edx,%fs:(%rcx)
>>>>>> 0x00007ffff7b018e8 <+88>: or $0xffffffffffffffff,%rax
>>>>>> 0x00007ffff7b018ec <+92>: jmp 0x7ffff7b018d8 <read+72>
>>>>>>
>>>>>>
>>>>>> So I am basically missing the 2 blocks starting at 0x00007ffff7b01899
>>>>>> and 0x00007ffff7b018a8.
>>>>>>
>>>>>> The edge between 0x00007ffff7b01890 and 0x00007ffff7b01899 is classified
>>>>>> as an interprocedual tail call (why?). Shouldn't the block be still part
>>>>>> of the function?
>>>>>>
>>>>>> Marc
>>>>>> _______________________________________________
>>>>>> Dyninst-api mailing list
>>>>>> Dyninst-api@xxxxxxxxxxx
>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> --bw
>>>
>>> Bill Williams
>>> Paradyn Project
>>> bill@xxxxxxxxxxx
>>
>
>
> --
> --bw
>
> Bill Williams
> Paradyn Project
> bill@xxxxxxxxxxx
|