Re: [DynInst_API:] Adding raw bytes before function

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Date:	Tue, 30 Aug 2016 02:01:58 +0200
From:	Matthias Fischer <fischmat@xxxxxxxxx>
Subject:	Re: [DynInst_API:] Adding raw bytes before function

Hi,

no problem and thank you for the idea with the NOP instruction to force relocation. I am familiar with your paper, there you use the label REX; INT3; INT3; INT3; to store data in the lower 4 bits of REX (by the way, is it possible to omit two of those INT3 instructions or use REX; REX; REX; INT3 ?). However, I either need full 5 bytes worth of space to directly store the information or if that is not possible, at least 2 full bytes to use as an offset in a data lookup table. So I am still interested in the approach that introduces a variable for each function during relocation.

Thanks,
Matthias

Am 30.08.2016 um 01:13 schrieb Victor van der Veen:

Apologies for jumping in.

I have implemented forward-edge CFI invariants similar to what you need very recently with Dyninst for our S&P paper [1]. The trick we use is to add a NOP instruction to every function which basically moves them to a shadow space. This allows us to overwrite the existing code space with a tag, using one of the lower process interfaces. A problem that arises is that not all code is moved: indirect jumps, for example, may jump back to the original code space. This means that the tag may still overwrite part of the function that lies before the function entry that you are tagging. I found that using only 2 byte tags did not break our programs as such indirect jump target would usually call or jump back to a function in the shadow space (tested on MySQL and node.js).

You may want to look into a similar direction. Our code should be open sourced at some point, but it is uncertain when exactly.

Best,
Victor

[1] http://vvdveen.com/publications/TypeArmor.pdf

On Aug 30, 2016 00:41, "Matthias Fischer" <fischmat@xxxxxxxxx> wrote:

Hi,

For the first variant I only see some type of associative container as a
solution there (something along the line of switch case would add too
much overhead) and even then that looks quite expensive and non trivial
(hash based might be lowest overhead but not so simple, tree based would
also fall in the non trivial category, binary search on sorted pairs
would be possibly the easiest of the three, but still introduces a non
negligible overhead). Even if I use C/C++ code for the mapping function
(then all associative containers would be rather trivial), the overhead
would probably be still too high.

The second variant looks promising, as long as all functions are
relocated, which I assume is the case ("we're redoing the layout of the
whole function anyway"), so I would appreciate some hints, where to look
to introduce changes.

Thanks,
Matthias

Am 30.08.2016 um 00:09 schrieb Bill Williams:
> Okay, I thought this looked like CFI; glad I'm not completely nuts. There are a couple of ways to do CFI in Dyninst without too much effort, I think. The first way would be a central map of [function entry address->tag] that's constructed in the mutator based on analysis and where you're inserting tags, and checked in the mutatee at any indirect control flow you want to validate. Doesn't need to be near anything, just needs to provide a map from target address->tag value. If you need locality, though, that doesn't work. The second, if you want to ensure that tags are at a fixed location with respect to the function entry point, would be to tweak the relocation classes to automatically add a tag variable to each function during relocation--we're redoing the layout of the whole function anyway. If that sounds more promising to you, I can give you some pointers to the right places to poke.
>
> --bw
>
>
> ________________________________________
> From: Matthias Fischer <fischmat@xxxxxxxxx>
> Sent: Monday, August 29, 2016 4:43 PM
> To: Bill Williams; dyninst-api@xxxxxxxxxxx
> Subject: Re: [DynInst_API:] Adding raw bytes before function
>
> Hi,
>
> What I want is not restricted to the call_site, but to affects both the
> call_target and the call_site, as I cannot access the actual call_target
> of the call_site during analysis/patching (indirect call_sites):
>
> For each call_target(function):
>Â Â ÂBPatch_snippet tag = make_tag_snippet(call_target); // <- this one
> needs to be uniformly accessible during runtime of the mutatee, cannot
> be passed during analysis/patching
>Â Â Âinsert_snippet(call_target, tag);
>
>
> For each call_site(these are indirect):
>Â Â ÂBPatch_arithmeticExpr tag_address =
> BPatch_arithmeticExpr(BPatch_add, BPatch_dynamicTargetExpr(),
> BPatch_constExpr(offset));
>Â Â ÂBPatch_arithmeticExpr tag_value =
> BPatch_arithmeticExpr(BPatch_deref, tag_address);
>Â Â ÂBPatch_snippet check = make_check_snippet(tag_value, call_site);
>Â Â Âinsert_snippet(call_site, check);
>
> And the call_site needs to access the tag from the call_target, which I
> cannot calculate during my analysis/patching (=> dynamicTargetExpr
> during runtime of the mutatee), therefore I cannot simply use the tag
> _expression_ from the call_target by passing it during analysis/patching.
>
> Thanks,
> Matthias
>
>
> Am 29.08.2016 um 23:21 schrieb Bill Williams:
>> If I'm understanding correctly, what you want is something like:
>>
>> for each call site:
>>Â Â ÂBPatch_variableExpr tag = /something/
>>Â Â ÂBPatch_snippet do_work = make_snippet(getDynamicTarget(call site), tag, ...)
>>Â Â Âinsert_snippet(call site, do_work)
>>
>> BPatch_malloc will hand back a tag that's guaranteed to be in a safe location, with a known type. It will be in the Dyninst private heap and will not collide with relocated code.
>>
>> BPatch_createVariable is a placement new, effectively--it will create a variable _expression_ of a given type at the address you hand it. There is no safety checking on that; it's intended to be used either in a region you've created or to point to a heap location. It will not be relocated.
>>
>> Relocation occurs on a per-function basis and happens when instrumentation is inserted (either at the time of the insert or at the time of insertionset::finalize). Springboards to relocated code in the form of either traps or branches exist for all relocated code.
>>
>> Is my understanding above correct? If so, can you fill in the details that make BPatch_malloc not the right tool for the job? If not, can you give me pseudocode that explains your use case better?
>>
>> --bw
>>
>> ________________________________________
>> From: Matthias Fischer <fischmat@xxxxxxxxx>
>> Sent: Monday, August 29, 2016 2:44 PM
>> To: Bill Williams; dyninst-api@xxxxxxxxxxx
>> Subject: Re: [DynInst_API:] Adding raw bytes before function
>>
>> Hi,
>>
>> thank you for your answer, but I do not think that BPatch_malloc can
>> help me there, because every function in the binary will have a
>> (possibly) different tag, based on collected information and
>> BPatch_malloc does not allow me to control or predict the address as far
>> as I can tell.
>> Furthermore, I need each indirect call_site to access this tag in a
>> uniform way before control is transferred to the call_target (this I can
>> achieve with BPatch_dynamicTargetExpr, when the tag is always at the
>> same position relative to the call_target's address).
>>
>> Now, I am assuming that code relocation will occur when a snippet and
>> actual code overlap, which will move the existing code into a new
>> location (the previous place is probably filled with trap instructions).
>> However, to prevent breaking indirect callsites, I am assuming that at
>> the previous function location there still needs to be something to
>> redirect control to the actual target (probably a directjump). Therefore
>> I assume that BPatch_createVariable does not allow all addresses, but
>> only those, which are "safe" aka not the jump instruction that keeps
>> indirect calls from breaking. Is this correct so far?
>> In case the above is correct, is there a way to determine the set of
>> "safe" addresses?
>>
>> Furthermore, will the address I give to BPatch_createVariable always
>> contain exactly this variable (short of removing/overwriting it
>> manually) or can it be subject to relocation? Are there any other
>> pitfalls regarding BPatch_createVariable?
>>
>> Thanks,
>> Matthias
>>
>> Am 29.08.2016 um 18:02 schrieb Bill Williams:
>>> Matthias--
>>>
>>> Unless you have very specific location constraints, using BPatch_malloc instead of BPatch_createVariable to create and allocate space for a BPatch_variableExpr should handle all of the bookkeeping for you. If you do have constraints, there are internal mechanisms that will help (we try to allocate relocated code near the original, for instance) but those aren't currently exposed; I'd want to understand your use case better before trying to push a constrained-malloc interface out to the public BPatch classes.
>>>
>>> Let me know if you've got further questions.
>>>
>>> --bw
>>>
>>> ________________________________________
>>> From: Dyninst-api <dyninst-api-bounces@xxxxxxxxedu> on behalf of Matthias Fischer <fischmat@xxxxxxxxx>
>>> Sent: Monday, August 29, 2016 9:30 AM
>>> To: dyninst-api@xxxxxxxxxxx
>>> Subject: [DynInst_API:] Adding raw bytes before function
>>>
>>> Hi,
>>>
>>> I'd like to write information to a binary (or process) for each function
>>> in way that allows access from an indirect callsite with minimal overhead.
>>>
>>> My current idea is to write raw bytes before the actual function code,
>>> so the bytes are stored at [callsite_target - offset, callsite_target[.
>>> Then I can access the information as a BPatch_variableExpr using
>>> BPatch_dynamicTargetExpr to calculate the address. However, I cannot
>>> simply write the bytes at this address without possibly overwriting
>>> existing instructions - the code before the function would have to be
>>> relocated.
>>>
>>> So far, I have not found a way to write raw bytes at a specific address
>>> with relocation of possible overwritten instructions. Is that even
>>> possible with dyninstAPI or is there another way to achieve my initial goal?
>>>
>>> Thanks,
>>> Matthias
>>>
>>>
>>> _______________________________________________
>>> Dyninst-api mailing list
>>> Dyninst-api@xxxxxxxxxxx
>>> https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api
>

_______________________________________________
Dyninst-api mailing list
Dyninst-api@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api

[← Prev in Thread]	Current Thread	[Next in Thread→]
[DynInst_API:] Adding raw bytes before function, Matthias Fischer Re: [DynInst_API:] Adding raw bytes before function, Bill Williams Re: [DynInst_API:] Adding raw bytes before function, Matthias Fischer Re: [DynInst_API:] Adding raw bytes before function, Bill Williams Re: [DynInst_API:] Adding raw bytes before function, Matthias Fischer Re: [DynInst_API:] Adding raw bytes before function, Bill Williams Re: [DynInst_API:] Adding raw bytes before function, Matthias Fischer Message not available Re: [DynInst_API:] Adding raw bytes before function, Victor van der Veen Re: [DynInst_API:] Adding raw bytes before function, Matthias Fischer <= Re: [DynInst_API:] Adding raw bytes before function, Bill Williams Re: [DynInst_API:] Adding raw bytes before function, Matthias Fischer [DynInst_API:] RFC: instpoint-level malloc interface (was Re: Adding raw bytes before function), Bill Williams Re: [DynInst_API:] RFC: instpoint-level malloc interface (was Re: Adding raw bytes before function), Matthias Fischer Re: [DynInst_API:] RFC: instpoint-level malloc interface (was Re: Adding raw bytes before function), Bill Williams

Previous by Date:	Re: [DynInst_API:] Adding raw bytes before function, Victor van der Veen
Next by Date:	Re: [DynInst_API:] Adding raw bytes before function, Bill Williams
Previous by Thread:	Re: [DynInst_API:] Adding raw bytes before function, Victor van der Veen
Next by Thread:	Re: [DynInst_API:] Adding raw bytes before function, Bill Williams
Indexes:	[Date] [Thread]

Mailing List Archives

Authenticated access

Re: [DynInst_API:] Adding raw bytes before function