Re: [DynInst_API:] Telling DynInst a particular function is non-returning


Date: Mon, 19 Mar 2018 19:44:54 +0100
From: Thomas Dullien <thomasdullien@xxxxxxxxxx>
Subject: Re: [DynInst_API:] Telling DynInst a particular function is non-returning
Hey there,

excellent, and thanks for the quick reply :-)

I will send a pull request for an easy cleanup (3) :-) and not
reformat until there's a good substantial change.

On (1), I have dug a bit further:

It appears that "Object" contains two relocation tables as members,
fbt_ and relocation_table_. Symtab seems to initialize
it's own member relocation_table_ by copying the contents of
Object.relocation_table_ in the one of the constructors; in
extract_info() it
seems to use Object.fbt_. The constructor that is hit in my case
(around line 1245 in 9.3.2) does not copy any entries. If I add
an extra loop to do it, at least the fbt_ in
SymtabCodeSource::init_linkage is no longer empty -- but the offsets
in the relocationEntry
are slightly off then, e.g. we'd need to use rel_addr() instead of
target_addr() as key?

I hope this helps somewhat,
Cheers,
Thomas

On Mon, Mar 19, 2018 at 6:56 PM, Bill Williams <bill@xxxxxxxxxxx> wrote:
> For 1) I'd have to do some code diving, but I can do a lightning round of policy answers for 2-5:
>
> 2) There's not presently good documentation for this code, though I've got a paper on the back burner that will address that somewhat. I can give you an overview later when I'm not fighting all the fires.
> 3) Easy cleanups should always be fine.
> 4) We tend to prefer that reformatting come along with some substantial changes (or at a minimum substantial review). Please treat four space indents as standard if you do this. The worse the formatting of a file is the lower the bar for fixing it.
> 5) Sure, lifting things to better/cleaner style is also always fine.
>
> --bw
> ________________________________________
> From: Dyninst-api <dyninst-api-bounces@xxxxxxxxxxx> on behalf of Thomas Dullien <thomasdullien@xxxxxxxxxx>
> Sent: Monday, March 19, 2018 11:08 AM
> To: Xiaozhu Meng
> Cc: dyninst-api
> Subject: Re: [DynInst_API:] Telling DynInst a particular function is non-returning
>
> Hey there,
>
> after some more digging, some more questions :-) -- some pertaining to
> my concrete question, some to the codebase
> in general.
>
> 1) I *think* I have identified the problem:
>
> Parser.C
> 1009: is_plt = HASHDEF(plt_entries,target);
> 1011: // CodeSource-defined tests
> 1012: is_nonret = obj().cs()->nonReturning(target);
>
> In my example run, plt_entries is empty when the code evaluates
> target=0x1030 (which is a jmp in the plt
> to __stack_chk_fail). Which part of the code would be normally
> responsible for initializing the plt entries? (see attached
> binary).
>
> This seems to happen because obj.cs()->linkage() seems to be empty
> during the construction of the Parser.
>
> The underlying culprit seems to be somewhere in SymtabCodeSource.C --
> normally, one would think that init_linkage
> would be responsible for filling the map. This function calls into
> Symtab::getFuncBindingTable, which ... strangely ...
> seems to only deal with a relocation table, not with a PLT ?
>
> So my question is: What part of the code is responsible for
> initializing the table with plt entries? :-)
>
> 2) Is there any documentation or rough overview about how the
> disassembler inside ParseAPI works? It seems
> the "workhorse" is parse_frames and parse_frame, but there is a fair
> bit of complexity in those functions, and not
> trivial to get a high-level overview :-)
>
> 3) Is it OK to submit pull requests with (semantics-preserving) code
> cleanups? There are a few places with
> pretty ugly hacks (for example the strcmp-statements around
> SymtabCodeSource.C below the "Achin added code" comment),
> and if nobody minds I'd clean them up :-)
>
> 4) Is it OK to submit pull requests with reformatting? The code seems
> to currently mix tabs and spaces in different files,
> and standardizing indentation on one of the two would make sense? The
> downside of such reformatting-commits is that
> they mess with git blame ...
>
> 5) Is it OK to submit pull requests that clean up iterator-based loops
> to for-in loops? E.g. is it OK to lift the code to use
> C++11 whenever it makes the code clearer / prettier?
>
> Cheers,
> Thomas
>
>
> On Fri, Mar 16, 2018 at 4:56 PM, Thomas Dullien
> <thomasdullien@xxxxxxxxxx> wrote:
>> Hey there,
>>
>> update: It seems that the obj.cs()->linkage() at the beginning of
>> Parser::Parser returns an empty
>> plt ?
>>
>> Cheers,
>> Thomas
>>
>> On Fri, Mar 16, 2018 at 4:39 PM, Thomas Dullien <thomasdullien@xxxxxxxxxx>
>> wrote:
>>>
>>> Hey there,
>>>
>>> ok, tracing this a bit further, I have some more questions :-)
>>>
>>> If I am not mistaken, the code around Parser.C, line 1013 is supposed to
>>> deal with calls to non-returning targets?
>>>
>>> is_nonret = obj().cs()->nonReturning(target);
>>> if (is_nonret) {
>>>  ....
>>>
>>> Now, in my example, for some reason the .plt.got stub function that jmps
>>> straight to cs:__stack_chk_fail is not a
>>> function:
>>>
>>> .plt.got:0000000000001030 __stack_chk_fail proc near              ; CODE
>>> XREF: postproc_version:loc_11D0âp
>>> .plt.got:0000000000001030                                         ;
>>> postproc_configuration:loc_1202âp ...
>>> .plt.got:0000000000001030                 jmp     cs:__stack_chk_fail_ptr
>>> .plt.got:0000000000001030 __stack_chk_fail endp
>>>
>>> So when nonReturning(0x1030) is called, no function gets returned in
>>> SymtabCodeSource::nonReturning while
>>> calling findFuncByEntryOffset, and then the code mistakenly adds the
>>> fallthrough edge.
>>>
>>> I am still finding my way through the codebase - but I presume there is
>>> some initial pass that is supposed to create
>>> functions from the stubs in the PLT and check them for non-returning
>>> status?
>>>
>>> Cheers,
>>> Thomas
>>>
>>>
>>>
>>> On Thu, Mar 15, 2018 at 6:33 PM, Thomas Dullien <thomasdullien@xxxxxxxxxx>
>>> wrote:
>>>>
>>>> Hey there,
>>>>
>>>> one quick question: After backporting the 3 commits, I tested with both
>>>> recursive-mode disassembly and non-recursive mode
>>>> disassembly, and neither of the two worked. My question is now: Is this
>>>> even supposed to work in non-recursive mode? E.g.
>>>> if we are not recursing into a called function, we can't tell whether it
>>>> returns or not, right?
>>>>
>>>> Cheers,
>>>> Thomas
>>>>
>>>> On Thu, Mar 15, 2018 at 2:23 PM, Thomas Dullien
>>>> <thomasdullien@xxxxxxxxxx> wrote:
>>>>>
>>>>> Hey there,
>>>>>
>>>>> ah, cool :-). I backported the 3 commits into the 9.3.2 codebase, but it
>>>>> does not seem to solve the issue just yet.
>>>>>
>>>>> I will debug a bit more and report back once I understand what is going
>>>>> on.
>>>>>
>>>>> Cheers,
>>>>> Thomas
>>>>>
>>>>> On Thu, Mar 15, 2018 at 2:09 PM, Xiaozhu Meng <xmeng@xxxxxxxxxxx> wrote:
>>>>>>
>>>>>> Ah, I have a few pending commits related to this issue. Please look at
>>>>>> my pull request at https://github.com/dyninst/dyninst/pull/437.
>>>>>>
>>>>>> Not all commits are relevant to you, but bbac557, 78b1bd1, da50a37
>>>>>> should be relevant.
>>>>>>
>>>>>> For parsing_printf, it is "DYNINST_DEBUG_PARSING".
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 15, 2018 at 8:02 AM, Thomas Dullien
>>>>>> <thomasdullien@xxxxxxxxxx> wrote:
>>>>>>>
>>>>>>> Hey there,
>>>>>>>
>>>>>>> ok, rebuilding DynInst 9.3.2 now to investigate why adding the string
>>>>>>> does not seem to have any effect.
>>>>>>> I looked at the source code, and I *think* the function is already in
>>>>>>> the list of non-returning functions.
>>>>>>>
>>>>>>> Checking what is going on. Example setup:
>>>>>>>
>>>>>>> .plt.got:0000000000053CF8 __stack_chk_fail proc near              ;
>>>>>>> CODE XREF: init:loc_54673âp
>>>>>>> .plt.got:0000000000053CF8                                         ;
>>>>>>> uninit:loc_54705âp ...
>>>>>>> .plt.got:0000000000053CF8                 jmp
>>>>>>> cs:__stack_chk_fail_ptr
>>>>>>> .plt.got:0000000000053CF8 __stack_chk_fail endp
>>>>>>>
>>>>>>> .text:00000000000A4290 var_8           = qword ptr -8
>>>>>>> .text:00000000000A4290
>>>>>>> .text:00000000000A4290 graph = rdi                             ;
>>>>>>> AVFilterGraph_0 *
>>>>>>> .text:00000000000A4290 flags = rsi                             ;
>>>>>>> unsigned int
>>>>>>> .text:00000000000A4290 ; __unwind {
>>>>>>> .text:00000000000A4290                 push    rax
>>>>>>> .text:00000000000A4291                 mov     rax, fs:28h
>>>>>>> .text:00000000000A429A                 mov     [rsp+8+var_8], rax
>>>>>>> .text:00000000000A429E                 mov     [graph+5Ch], esi
>>>>>>> .text:00000000000A42A1                 mov     rax, fs:28h
>>>>>>> .text:00000000000A42AA                 cmp     rax, [rsp+8+var_8]
>>>>>>> .text:00000000000A42AE                 jnz     short loc_A42B2
>>>>>>> .text:00000000000A42B0                 pop     rax
>>>>>>> .text:00000000000A42B1                 retn
>>>>>>> .text:00000000000A42B2 ;
>>>>>>> ---------------------------------------------------------------------------
>>>>>>> .text:00000000000A42B2
>>>>>>> .text:00000000000A42B2 loc_A42B2:                              ; CODE
>>>>>>> XREF: avfilter_graph_set_auto_convert+1Eâj
>>>>>>> .text:00000000000A42B2                 call    __stack_chk_fail
>>>>>>> .text:00000000000A42B2 ; } // starts at A4290
>>>>>>> .text:00000000000A42B2 avfilter_graph_set_auto_convert endp
>>>>>>>
>>>>>>> DynInst for some reason does not interpret the tailing call as
>>>>>>> non-return. Digging to see what is going on,
>>>>>>> will update here as I learn more :)
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Thomas
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Mar 15, 2018 at 1:58 PM, Xiaozhu Meng <mxz297@xxxxxxxxx>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Thomas,
>>>>>>>>
>>>>>>>> On Thu, Mar 15, 2018 at 4:06 AM, Thomas Dullien
>>>>>>>> <thomasdullien@xxxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>> Hey there,
>>>>>>>>>
>>>>>>>>> ok, I have looked at a few options on how to best tackle this, and
>>>>>>>>> would love to solicit advice :-)
>>>>>>>>>
>>>>>>>>> - I tried SymtabCodeSource::addNonReturning("__stack_chk_fail");
>>>>>>>>> this did not seem to have an effect.
>>>>>>>>
>>>>>>>>
>>>>>>>> If you call SymtabCodeSource::addNonReturning("__stack_chk_fail")
>>>>>>>> before calling CodeObject::parse(), this should work.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> - Looked at set_retstatus -- but that implies that the code is
>>>>>>>>> already parsed?
>>>>>>>>
>>>>>>>>
>>>>>>>> You are right that after CodeObject::parse() has finished, calling
>>>>>>>> set_retstatus will only change the flag of return status for this function,
>>>>>>>> will not re-parse the binary. So, we can focus on why
>>>>>>>> SymtabCodeSource::addNonReturning("__stack_chk_fail") does not work.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Thomas
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Mar 15, 2018 at 9:28 AM, Thomas Dullien
>>>>>>>>> <thomasdullien@xxxxxxxxxx> wrote:
>>>>>>>>>>
>>>>>>>>>> Ah. No. I just fund set_retstatus :-) -- please ignore my question
>>>>>>>>>> :-)
>>>>>>>>>>
>>>>>>>>>> On Thu, Mar 15, 2018 at 9:26 AM, Thomas Dullien
>>>>>>>>>> <thomasdullien@xxxxxxxxxx> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hey there,
>>>>>>>>>>>
>>>>>>>>>>> after having my coffee, I realized: The proper way to do this is
>>>>>>>>>>> to derive from CodeSource
>>>>>>>>>>> and overload the nonReturning functions, I guess? :)
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Thomas
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Mar 15, 2018 at 9:14 AM, Thomas Dullien
>>>>>>>>>>> <thomasdullien@xxxxxxxxxx> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hey there,
>>>>>>>>>>>>
>>>>>>>>>>>> I am running into troubles with disassembling executables
>>>>>>>>>>>> generated by
>>>>>>>>>>>> clang.3.8.1-24, for x64, with optimization set to size-optimize
>>>>>>>>>>>> and stack cookies
>>>>>>>>>>>> enabled.
>>>>>>>>>>>>
>>>>>>>>>>>> The trouble is that any function with an enabled stack cookie
>>>>>>>>>>>> will end with a sequence
>>>>>>>>>>>> of:
>>>>>>>>>>>>
>>>>>>>>>>>>   Epilogue to check stack cookie
>>>>>>>>>>>>   jnz .fail
>>>>>>>>>>>>   Rest of epilogue.
>>>>>>>>>>>>   retn
>>>>>>>>>>>> .fail:
>>>>>>>>>>>>   call __stack_checkfail     // Does not return
>>>>>>>>>>>>
>>>>>>>>>>>> This leads to DynInst lumping all consecutive functions that use
>>>>>>>>>>>> stack cookies
>>>>>>>>>>>> into one huge CFG.
>>>>>>>>>>>>
>>>>>>>>>>>> Is there a way to tell DynInst that a particular function is not
>>>>>>>>>>>> returning? I found
>>>>>>>>>>>> that the parseAPI allows me to query if a function returns, but I
>>>>>>>>>>>> did not find anything
>>>>>>>>>>>> to "override" this behavior?
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Thomas
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Dyninst-api mailing list
>>>>>>>>> Dyninst-api@xxxxxxxxxxx
>>>>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Dyninst-api mailing list
>>>>>>> Dyninst-api@xxxxxxxxxxx
>>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

[← Prev in Thread] Current Thread [Next in Thread→]