Re: [DynInst_API:] Telling DynInst a particular function is non-returning


Date: Mon, 19 Mar 2018 17:56:49 +0000
From: Bill Williams <bill@xxxxxxxxxxx>
Subject: Re: [DynInst_API:] Telling DynInst a particular function is non-returning
For 1) I'd have to do some code diving, but I can do a lightning round of policy answers for 2-5:

2) There's not presently good documentation for this code, though I've got a paper on the back burner that will address that somewhat. I can give you an overview later when I'm not fighting all the fires.
3) Easy cleanups should always be fine.
4) We tend to prefer that reformatting come along with some substantial changes (or at a minimum substantial review). Please treat four space indents as standard if you do this. The worse the formatting of a file is the lower the bar for fixing it.
5) Sure, lifting things to better/cleaner style is also always fine.

--bw
________________________________________
From: Dyninst-api <dyninst-api-bounces@xxxxxxxxxxx> on behalf of Thomas Dullien <thomasdullien@xxxxxxxxxx>
Sent: Monday, March 19, 2018 11:08 AM
To: Xiaozhu Meng
Cc: dyninst-api
Subject: Re: [DynInst_API:] Telling DynInst a particular function is non-returning

Hey there,

after some more digging, some more questions :-) -- some pertaining to
my concrete question, some to the codebase
in general.

1) I *think* I have identified the problem:

Parser.C
1009: is_plt = HASHDEF(plt_entries,target);
1011: // CodeSource-defined tests
1012: is_nonret = obj().cs()->nonReturning(target);

In my example run, plt_entries is empty when the code evaluates
target=0x1030 (which is a jmp in the plt
to __stack_chk_fail). Which part of the code would be normally
responsible for initializing the plt entries? (see attached
binary).

This seems to happen because obj.cs()->linkage() seems to be empty
during the construction of the Parser.

The underlying culprit seems to be somewhere in SymtabCodeSource.C --
normally, one would think that init_linkage
would be responsible for filling the map. This function calls into
Symtab::getFuncBindingTable, which ... strangely ...
seems to only deal with a relocation table, not with a PLT ?

So my question is: What part of the code is responsible for
initializing the table with plt entries? :-)

2) Is there any documentation or rough overview about how the
disassembler inside ParseAPI works? It seems
the "workhorse" is parse_frames and parse_frame, but there is a fair
bit of complexity in those functions, and not
trivial to get a high-level overview :-)

3) Is it OK to submit pull requests with (semantics-preserving) code
cleanups? There are a few places with
pretty ugly hacks (for example the strcmp-statements around
SymtabCodeSource.C below the "Achin added code" comment),
and if nobody minds I'd clean them up :-)

4) Is it OK to submit pull requests with reformatting? The code seems
to currently mix tabs and spaces in different files,
and standardizing indentation on one of the two would make sense? The
downside of such reformatting-commits is that
they mess with git blame ...

5) Is it OK to submit pull requests that clean up iterator-based loops
to for-in loops? E.g. is it OK to lift the code to use
C++11 whenever it makes the code clearer / prettier?

Cheers,
Thomas


On Fri, Mar 16, 2018 at 4:56 PM, Thomas Dullien
<thomasdullien@xxxxxxxxxx> wrote:
> Hey there,
>
> update: It seems that the obj.cs()->linkage() at the beginning of
> Parser::Parser returns an empty
> plt ?
>
> Cheers,
> Thomas
>
> On Fri, Mar 16, 2018 at 4:39 PM, Thomas Dullien <thomasdullien@xxxxxxxxxx>
> wrote:
>>
>> Hey there,
>>
>> ok, tracing this a bit further, I have some more questions :-)
>>
>> If I am not mistaken, the code around Parser.C, line 1013 is supposed to
>> deal with calls to non-returning targets?
>>
>> is_nonret = obj().cs()->nonReturning(target);
>> if (is_nonret) {
>>  ....
>>
>> Now, in my example, for some reason the .plt.got stub function that jmps
>> straight to cs:__stack_chk_fail is not a
>> function:
>>
>> .plt.got:0000000000001030 __stack_chk_fail proc near              ; CODE
>> XREF: postproc_version:loc_11D0↓p
>> .plt.got:0000000000001030                                         ;
>> postproc_configuration:loc_1202↓p ...
>> .plt.got:0000000000001030                 jmp     cs:__stack_chk_fail_ptr
>> .plt.got:0000000000001030 __stack_chk_fail endp
>>
>> So when nonReturning(0x1030) is called, no function gets returned in
>> SymtabCodeSource::nonReturning while
>> calling findFuncByEntryOffset, and then the code mistakenly adds the
>> fallthrough edge.
>>
>> I am still finding my way through the codebase - but I presume there is
>> some initial pass that is supposed to create
>> functions from the stubs in the PLT and check them for non-returning
>> status?
>>
>> Cheers,
>> Thomas
>>
>>
>>
>> On Thu, Mar 15, 2018 at 6:33 PM, Thomas Dullien <thomasdullien@xxxxxxxxxx>
>> wrote:
>>>
>>> Hey there,
>>>
>>> one quick question: After backporting the 3 commits, I tested with both
>>> recursive-mode disassembly and non-recursive mode
>>> disassembly, and neither of the two worked. My question is now: Is this
>>> even supposed to work in non-recursive mode? E.g.
>>> if we are not recursing into a called function, we can't tell whether it
>>> returns or not, right?
>>>
>>> Cheers,
>>> Thomas
>>>
>>> On Thu, Mar 15, 2018 at 2:23 PM, Thomas Dullien
>>> <thomasdullien@xxxxxxxxxx> wrote:
>>>>
>>>> Hey there,
>>>>
>>>> ah, cool :-). I backported the 3 commits into the 9.3.2 codebase, but it
>>>> does not seem to solve the issue just yet.
>>>>
>>>> I will debug a bit more and report back once I understand what is going
>>>> on.
>>>>
>>>> Cheers,
>>>> Thomas
>>>>
>>>> On Thu, Mar 15, 2018 at 2:09 PM, Xiaozhu Meng <xmeng@xxxxxxxxxxx> wrote:
>>>>>
>>>>> Ah, I have a few pending commits related to this issue. Please look at
>>>>> my pull request at https://github.com/dyninst/dyninst/pull/437.
>>>>>
>>>>> Not all commits are relevant to you, but bbac557, 78b1bd1, da50a37
>>>>> should be relevant.
>>>>>
>>>>> For parsing_printf, it is "DYNINST_DEBUG_PARSING".
>>>>>
>>>>>
>>>>> On Thu, Mar 15, 2018 at 8:02 AM, Thomas Dullien
>>>>> <thomasdullien@xxxxxxxxxx> wrote:
>>>>>>
>>>>>> Hey there,
>>>>>>
>>>>>> ok, rebuilding DynInst 9.3.2 now to investigate why adding the string
>>>>>> does not seem to have any effect.
>>>>>> I looked at the source code, and I *think* the function is already in
>>>>>> the list of non-returning functions.
>>>>>>
>>>>>> Checking what is going on. Example setup:
>>>>>>
>>>>>> .plt.got:0000000000053CF8 __stack_chk_fail proc near              ;
>>>>>> CODE XREF: init:loc_54673↓p
>>>>>> .plt.got:0000000000053CF8                                         ;
>>>>>> uninit:loc_54705↓p ...
>>>>>> .plt.got:0000000000053CF8                 jmp
>>>>>> cs:__stack_chk_fail_ptr
>>>>>> .plt.got:0000000000053CF8 __stack_chk_fail endp
>>>>>>
>>>>>> .text:00000000000A4290 var_8           = qword ptr -8
>>>>>> .text:00000000000A4290
>>>>>> .text:00000000000A4290 graph = rdi                             ;
>>>>>> AVFilterGraph_0 *
>>>>>> .text:00000000000A4290 flags = rsi                             ;
>>>>>> unsigned int
>>>>>> .text:00000000000A4290 ; __unwind {
>>>>>> .text:00000000000A4290                 push    rax
>>>>>> .text:00000000000A4291                 mov     rax, fs:28h
>>>>>> .text:00000000000A429A                 mov     [rsp+8+var_8], rax
>>>>>> .text:00000000000A429E                 mov     [graph+5Ch], esi
>>>>>> .text:00000000000A42A1                 mov     rax, fs:28h
>>>>>> .text:00000000000A42AA                 cmp     rax, [rsp+8+var_8]
>>>>>> .text:00000000000A42AE                 jnz     short loc_A42B2
>>>>>> .text:00000000000A42B0                 pop     rax
>>>>>> .text:00000000000A42B1                 retn
>>>>>> .text:00000000000A42B2 ;
>>>>>> ---------------------------------------------------------------------------
>>>>>> .text:00000000000A42B2
>>>>>> .text:00000000000A42B2 loc_A42B2:                              ; CODE
>>>>>> XREF: avfilter_graph_set_auto_convert+1E↑j
>>>>>> .text:00000000000A42B2                 call    __stack_chk_fail
>>>>>> .text:00000000000A42B2 ; } // starts at A4290
>>>>>> .text:00000000000A42B2 avfilter_graph_set_auto_convert endp
>>>>>>
>>>>>> DynInst for some reason does not interpret the tailing call as
>>>>>> non-return. Digging to see what is going on,
>>>>>> will update here as I learn more :)
>>>>>>
>>>>>> Cheers,
>>>>>> Thomas
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 15, 2018 at 1:58 PM, Xiaozhu Meng <mxz297@xxxxxxxxx>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Thomas,
>>>>>>>
>>>>>>> On Thu, Mar 15, 2018 at 4:06 AM, Thomas Dullien
>>>>>>> <thomasdullien@xxxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>> Hey there,
>>>>>>>>
>>>>>>>> ok, I have looked at a few options on how to best tackle this, and
>>>>>>>> would love to solicit advice :-)
>>>>>>>>
>>>>>>>> - I tried SymtabCodeSource::addNonReturning("__stack_chk_fail");
>>>>>>>> this did not seem to have an effect.
>>>>>>>
>>>>>>>
>>>>>>> If you call SymtabCodeSource::addNonReturning("__stack_chk_fail")
>>>>>>> before calling CodeObject::parse(), this should work.
>>>>>>>
>>>>>>>>
>>>>>>>> - Looked at set_retstatus -- but that implies that the code is
>>>>>>>> already parsed?
>>>>>>>
>>>>>>>
>>>>>>> You are right that after CodeObject::parse() has finished, calling
>>>>>>> set_retstatus will only change the flag of return status for this function,
>>>>>>> will not re-parse the binary. So, we can focus on why
>>>>>>> SymtabCodeSource::addNonReturning("__stack_chk_fail") does not work.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Thomas
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Mar 15, 2018 at 9:28 AM, Thomas Dullien
>>>>>>>> <thomasdullien@xxxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>> Ah. No. I just fund set_retstatus :-) -- please ignore my question
>>>>>>>>> :-)
>>>>>>>>>
>>>>>>>>> On Thu, Mar 15, 2018 at 9:26 AM, Thomas Dullien
>>>>>>>>> <thomasdullien@xxxxxxxxxx> wrote:
>>>>>>>>>>
>>>>>>>>>> Hey there,
>>>>>>>>>>
>>>>>>>>>> after having my coffee, I realized: The proper way to do this is
>>>>>>>>>> to derive from CodeSource
>>>>>>>>>> and overload the nonReturning functions, I guess? :)
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Thomas
>>>>>>>>>>
>>>>>>>>>> On Thu, Mar 15, 2018 at 9:14 AM, Thomas Dullien
>>>>>>>>>> <thomasdullien@xxxxxxxxxx> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hey there,
>>>>>>>>>>>
>>>>>>>>>>> I am running into troubles with disassembling executables
>>>>>>>>>>> generated by
>>>>>>>>>>> clang.3.8.1-24, for x64, with optimization set to size-optimize
>>>>>>>>>>> and stack cookies
>>>>>>>>>>> enabled.
>>>>>>>>>>>
>>>>>>>>>>> The trouble is that any function with an enabled stack cookie
>>>>>>>>>>> will end with a sequence
>>>>>>>>>>> of:
>>>>>>>>>>>
>>>>>>>>>>>   Epilogue to check stack cookie
>>>>>>>>>>>   jnz .fail
>>>>>>>>>>>   Rest of epilogue.
>>>>>>>>>>>   retn
>>>>>>>>>>> .fail:
>>>>>>>>>>>   call __stack_checkfail     // Does not return
>>>>>>>>>>>
>>>>>>>>>>> This leads to DynInst lumping all consecutive functions that use
>>>>>>>>>>> stack cookies
>>>>>>>>>>> into one huge CFG.
>>>>>>>>>>>
>>>>>>>>>>> Is there a way to tell DynInst that a particular function is not
>>>>>>>>>>> returning? I found
>>>>>>>>>>> that the parseAPI allows me to query if a function returns, but I
>>>>>>>>>>> did not find anything
>>>>>>>>>>> to "override" this behavior?
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Thomas
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Dyninst-api mailing list
>>>>>>>> Dyninst-api@xxxxxxxxxxx
>>>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dyninst-api mailing list
>>>>>> Dyninst-api@xxxxxxxxxxx
>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api
>>>>>
>>>>>
>>>>
>>>
>>
>

[← Prev in Thread] Current Thread [Next in Thread→]