On 02/27/2015 08:27 AM, Aleksandar Nikolic wrote:
Hi ,
so after a bit more tinkering I got the instrumented binary to look
almost good.
I think I understand what the next problem is tho.
Here is the piece of instrumentation code in my example to explain my self
and check my reasoning:
.dyninst:00422014 dd 0
.dyninst:00422018 dd 0
.dyninst:0042201C
.dyninst:0042201C ; =============== S U B R O U T I N E
=======================================
.dyninst:0042201C
.dyninst:0042201C
.dyninst:0042201C _main_1 proc near ; CODE XREF:
_mainj
.dyninst:0042201C
.dyninst:0042201C var_14 = dword ptr -14h
.dyninst:0042201C var_4 = dword ptr -4
.dyninst:0042201C
.dyninst:0042201C lea esp, [esp-14h]
.dyninst:00422020 mov [esp+14h+var_4], eax
.dyninst:00422024 lea eax, [esp+14h]
.dyninst:00422028 and esp, 0FFFFFFF0h
.dyninst:0042202B mov [esp+14h+var_14], eax
.dyninst:0042202E mov eax, [eax-4]
.dyninst:00422031 pusha
.dyninst:00422032 push 1010h
.dyninst:00422037 push ebp
.dyninst:00422038 mov ebp, esp
.dyninst:0042203A lea esp, [esp-88h]
.dyninst:00422041 call $+5
.dyninst:00422046
.dyninst:00422046 loc_422046: ; DATA XREF:
_main_1+5Bw
.dyninst:00422046 pop ecx
.dyninst:00422047 mov eax, [ecx-32h]
.dyninst:0042204A mov edx, [eax]
.dyninst:0042204C test edx, edx
.dyninst:0042204E jz locret_422085
.dyninst:00422054 mov edx, 0
.dyninst:00422059 mov [eax], edx
.dyninst:0042205B mov edx, 0
.dyninst:00422060 push edx
.dyninst:00422061 mov [ebp-8], eax
.dyninst:00422064 mov [ebp-0Ch], ecx
.dyninst:00422067 mov ebx, [ebp-0Ch]
.dyninst:0042206A mov eax, [ebx-2Eh]
.dyninst:0042206D call eax
....
relocated code from main ...
.dyninst:0042208B mov esp, [esp+14h+var_14]
.dyninst:0042208E push ebp
.dyninst:0042208F mov ebp, esp
.dyninst:00422091 push offset format ; "hello"
.dyninst:00422096 call _printf
...
.dyninst:00422100 ; Imports from C:\Program
Files\Dyninst\lib\dyninstAPI_RT.dll
.dyninst:00422100 ;
.dyninst:00422100 DYNINST_bootstrap_info dd ?
.dyninst:00422104 align 8
.dyninst:00422108 ;
.dyninst:00422108 ; Imports from libInst.dll
.dyninst:00422108 ;
.dyninst:00422108 incFuncCoverage dd ?
In this test example I am using a stripped down version of codeCoverage
tool,
it instruments the begining of function main which just prints hello world.
There are two problems here and I just want to see if my reasoning about
them is correct before
I start addressing them.
The above code is using getpc construction to find itself in the memory
to do pc relative addressing
and then retrieves the value from [ecx-32h] which in this case will be
00422014. If my reading
and crossreferencing of the ELF code is correct, that address is
supposed to contain a pointer
to DYNINST_bootstrap_info (See PS note). But in this case , at runtime
it points to NULL, even tho the import is properly resolved at 00422104.
I've tracked down that this reloc is correctly recorded, but I guess it
just isn't added to the produced
binary in emitWin.C. Is that reasoning correct?
I certainly don't see anything in emitWin that handles relocs, and we
obviously need to be able to add them.
Added note: the push 1010 should be pushing the original return address
onto the stack, if I'm not mistaken. That's something that should, on
Windows, get a relocation recorded but doesn't.
Finally, on Windows, we probably ought to be using relocations in
preference to PC-relative addressing.
The same issue is with "call eax" which is supposed to call the
incFuncCoverage instrumentation
function and gets the pointer from 00422018 which is , again, NULL, but
hte 00422108 import slot
is properly resolved at runtime.
I guess the reloc info should be added to the binary to solve this, as
manually fixing it
at runtime via debugger actually makes the instrumentation function
execute properly.
Another problem, and bigger one as it seems, is the code that is copied
from the main function.
In this example, the offset to "hello" would clearly require relocation
info in the mutated binary.
Does dyninst track this info when copying the code, or would that
analysis need to be added too?
Dyninst does not track that specifically, but we do have data structures
in the relocation system that map original addresses to relocated ones.
If you look at addressSpace::relocate and follow the flow from there,
you should see where that data's kept. I think what we want is a couple
of passes in generating the new relocs table: one where we add all of
the new relocs in code we generated (which involves codegen modification
to keep track of this, yes), and one where for each original reloc, we
modify it to point to the corresponding instruction's final location. We
may need a third pass for springboards, come to think of it--anything
that we're reaching through any form of indirect jump (including a
push/ret combination) needs relocation info in order to deal with ASLR.
I'm getting to know the codebase pretty well, but there are obviously
parts I haven't studied yet,
and just wanted to know that I didn't miss anything that is already
being done.
PS
Should it be DYNINST_bootstrap_info or DYNINST_default_tramp_guards? I
see the code that is adding
DYNINST_default_tramp_guards, but somehow DYNINST_bootstrap_info symbol
gets added in the end.
Is this ok, or it's a separate bug?
Thanks,
Aleks
|