Re: [DynInst_API:] BPatch_binaryEdit openBinary crashing


Date: Fri, 27 Feb 2015 15:27:34 +0100
From: Aleksandar Nikolic <nikolic.alek@xxxxxxxxx>
Subject: Re: [DynInst_API:] BPatch_binaryEdit openBinary crashing
Hi ,

so Âafter a bit more tinkering I got the instrumented binary to look almost good.
I think I understand what the next problem is tho.Â
Here is the piece of instrumentation code in my example to explain my selfÂ
and check my reasoning:

.dyninst:00422014 Â Â Â Â Â Â Â Â dd 0
.dyninst:00422018 Â Â Â Â Â Â Â Â dd 0
.dyninst:0042201C
.dyninst:0042201C ; =============== S U B R O U T I N E =======================================
.dyninst:0042201C
.dyninst:0042201C
.dyninst:0042201C _main_1     proc near        ; CODE XREF: _mainj
.dyninst:0042201C
.dyninst:0042201C var_14 Â Â Â Â Â= dword ptr -14h
.dyninst:0042201C var_4 Â Â Â Â Â = dword ptr -4
.dyninst:0042201C
.dyninst:0042201C         lea   esp, [esp-14h]
.dyninst:00422020         mov   [esp+14h+var_4], eax
.dyninst:00422024         lea   eax, [esp+14h]
.dyninst:00422028         and   esp, 0FFFFFFF0h
.dyninst:0042202B         mov   [esp+14h+var_14], eax
.dyninst:0042202E         mov   eax, [eax-4]
.dyninst:00422031 Â Â Â Â Â Â Â Â pusha
.dyninst:00422032         push  Â1010h
.dyninst:00422037         push  Âebp
.dyninst:00422038         mov   ebp, esp
.dyninst:0042203A         lea   esp, [esp-88h]
.dyninst:00422041         call  Â$+5
.dyninst:00422046
.dyninst:00422046 loc_422046: Â Â Â Â Â Â Â Â Â Â Â Â Â Â ; DATA XREF: _main_1+5Bw
.dyninst:00422046         pop   ecx
.dyninst:00422047         mov   eax, [ecx-32h]
.dyninst:0042204A         mov   edx, [eax]
.dyninst:0042204C         test  Âedx, edx
.dyninst:0042204E         jz   Âlocret_422085
.dyninst:00422054         mov   edx, 0
.dyninst:00422059         mov   [eax], edx
.dyninst:0042205B         mov   edx, 0
.dyninst:00422060         push  Âedx
.dyninst:00422061         mov   [ebp-8], eax
.dyninst:00422064         mov   [ebp-0Ch], ecx
.dyninst:00422067         mov   ebx, [ebp-0Ch]
.dyninst:0042206A         mov   eax, [ebx-2Eh]
.dyninst:0042206D         call  Âeax

....
relocated code from main ...
.dyninst:0042208B         mov   esp, [esp+14h+var_14]
.dyninst:0042208E         push  Âebp
.dyninst:0042208F         mov   ebp, esp
.dyninst:00422091         push  Âoffset format  ; "hello"
.dyninst:00422096         call  Â_printf
...

.dyninst:00422100 ; Imports from C:\Program Files\Dyninst\lib\dyninstAPI_RT.dll
.dyninst:00422100 ;
.dyninst:00422100 DYNINST_bootstrap_info dd ?
.dyninst:00422104 Â Â Â Â Â Â Â Â align 8
.dyninst:00422108 ;
.dyninst:00422108 ; Imports from libInst.dll
.dyninst:00422108 ;
.dyninst:00422108 incFuncCoverage dd ?

In this test example I am using a stripped down version of codeCoverage tool,
it instruments the begining of function main which just prints hello world.Â

There are two problems here and I just want to see if my reasoning about them is correct beforeÂ
I start addressing them.

The above code is using getpc construction to find itself in the memory to do pc relative addressingÂ
and then retrieves the value from [ecx-32h] which in this case will be 00422014. If my readingÂ
and crossreferencing of the ELF code is correct, that address is supposed to contain a pointerÂ
to ÂDYNINST_bootstrap_info (See PS note). But in this case , at runtimeÂ
it points to NULL, even tho the import is properly resolved at 00422104.Â
I've tracked down that this reloc is correctly recorded, but I guess it just isn't added to the producedÂ
binary in emitWin.C. Is that reasoning correct?
The same issue is with Â"call  Âeax" which is supposed to call the incFuncCoverage instrumentation
function and gets the pointer from 00422018 which is , again, NULL, but hte 00422108 import slotÂ
is properly resolved at runtime.Â
I guess the reloc info should be added to the binary to solve this, as manually fixing itÂ
at runtime via debugger actually makes the instrumentation function execute properly.Â

Another problem, and bigger one as it seems, is the code that is copied from the main function.
In this example, the offset to "hello" would clearly require relocation info in the mutated binary.
Does dyninst track this info when copying the code, or would that analysis need to be added too?

I'm getting to know the codebase pretty well, but there are obviously parts I haven't studied yet,Â
and just wanted to know that I didn't miss anything that is already being done.

PS
Should it be DYNINST_bootstrap_info Âor DYNINST_default_tramp_guards? I see the code that is addingÂ
DYNINST_default_tramp_guards, but somehow DYNINST_bootstrap_info symbol gets added in the end.
ÂIs this ok, or it's a separate bug?Â

Thanks,
Aleks

On Wed, Feb 25, 2015 at 9:33 PM, Bill Williams <bill@xxxxxxxxxxx> wrote:
On 02/25/2015 02:28 PM, Aleksandar Nikolic wrote:
I seem to have tracked down the cause of all my issues, at least
partialy, to this piece of code in binaryEdit:

base += (1024*1024);
base -= (base & (1024*1024-1));

in openFile

Now, this base adjustment clearly has a purpose, but if commented out,
the instrumented PE file that is produced has a good import table
and good trampolines to instrumentation code.
I guess it's required for opening other (non PE) files?

That would be aligning up to the next 1MB boundary, which is an ELF requirement. If it doesn't hold for PE files, it should be okay to relax that.

(Side note: we really ought to add format_elf and format_pe #defines where applicable, mapped appropriately to relevant OSes. If you feel particularly motivated, it's fine to use this as the initial motivating test case.)


On 02/25/2015 07:59 PM, Bill Williams wrote:

I'll take a look at the patches over the next couple of days, but this
all sounds very promising.

I don't have a definite answer for the trampoline issue, but I'd look at
whether there's a similar issue to the one with the imports where we
generated branches before .dyninst was fixed and didn't recalculate
them. The springboard code is very good at doing what it's told, so I'd
strongly suspect that we moved the section of relocated code after we
generated springboards.


It would seem that that is the case. If if fix the base address
"manually", it sort of works. As my patch for imports is hacky, is
there a part of the API that does the recalculations or should I do
them myself?

If memory serves, what we do on Linux is ensure that .dyninstInst is
created somewhere fixed (end of the binary, more or less), so that we
can actually generate the code correctly at instrumentation time. That's
going to be the safest/easiest approach, I think--otherwise we might
need to replace 5-byte near branches with longer code sequences and
wholly regenerate the section contents.


Cheers,
Alex

On 02/11/2015 06:20 PM, Matthew LeGendre wrote:

At one point, perhaps 6-7 years ago, a student had windows binary
rewriting working to the point where you could do basic binary
rewriting
on notepad.exe. They left before finishing the project, and it was
never feature complete nor functional on complicated binaries. You're
likely seeing the remains of that effort. I don't know how much of
that
code is still valid or useful.

-Matt


On Wed, 11 Feb 2015, Aleksandar Nikolic wrote:
Hi,

looking at the codebase, a lot of code seems to already be there.
I'll be getting to know the code in more details. Any directions
into what would need to be implemented or what parts are missing?

Thanks,
Alex

On 02/08/2015 10:59 PM, Barton Miller wrote:
BTW, if there are any individuals or groups that would like to
work on
getting rewriting to work on Windows, we'd be happy to provide
support.
Not a small effort but interesting and worthwhile.

--bart


On 2/6/2015 4:36 PM, Bill Williams wrote:
No, and not exactly. Windows binary rewriting is not supported,
and is
documented as such. If it were to be supported, what you are doing
would work quite reasonably.
_______________________________________________
Dyninst-api mailing list
Dyninst-api@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api
_______________________________________________
Dyninst-api mailing list
Dyninst-api@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api

_______________________________________________
Dyninst-api mailing list
Dyninst-api@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api







--
--bw

Bill Williams
Paradyn Project
bill@xxxxxxxxxxx

[← Prev in Thread] Current Thread [Next in Thread→]