Re: [DynInst_API:] Function Entry Point Recognition in Stripped Binaries


Date: Wed, 06 Jan 2016 11:08:43 -0600
From: Bill Williams <bill@xxxxxxxxxxx>
Subject: Re: [DynInst_API:] Function Entry Point Recognition in Stripped Binaries
On 01/06/2016 10:43 AM, Shuai Wang wrote:
Dear list,

I am writing to ask how to use DynInst to recognize function entry points (memory addresses) in stripped binaries.


I successfully installed the 32-bit DynInst 9.10, and I use a DynInst script to iterate all the functions with the following commands to dump all the function entry point addresses from stripped binaries.

                     .......
                     vector<BPatch_module *> * modules = appImage->getModules();
                     ...... 
                     vector<BPatch_function *> * funcs = (*module_iter)->getProcedures();
                     vector<BPatch_function *>::iterator func_iter;
                     for(func_iter = funcs->begin(); func_iter != funcs->end(); ++func_iter) {
                          char functionName[1024];
                          (*func_iter)->getName(functionName, 1024);
                          cout << "-- Function : " << functionName << " --" << endl;
                     ...... 

I extract the function entry point addresses from the function names.                     

I test some LLVM compiler CoreUtil binaries with O2 optimization level. And the precision/recall rate is general very good!  Precision: 0.99;  Recall: 0.91

According to this paper, Section 6.2, on average DynInst can have over 0.97 precision, and 0.93 recall on 32-bit ELF binaries. It is very consistent with my test! But still, I am not sure whether I did everything correct. 

So here are my questions:

1. It seems that by leveraging machine learning method to recognize functions, DynInst needs a training process before recognition, but I didn't do any training  (although the results are pretty good), is there anything in particular I have to do before using DynInst? 

The training step has been done once and the resulting model is baked into the Dyninst code base. Your experimental setup should be correct.

2. If there is a "pre-trained" model installed in DynInst 9.10 already, what kind of binaries does this model include?  For example, can I use it to test 32-bit ELF binaries compiled from LLVM with O3? or ICC with O3? 

Dyninst was trained on the test set of binaries produced by the BAP group at CMU, which includes binutils and coreutils binaries built with gcc and icc at O0 through O3 (as well as Windows binaries, though that's of course producing a separate model). I expect the model to generalize decently to LLVM binaries, and we'd be interested to hear your results. Our initial indications are that these models, applied to modern compiler versions, are not terribly sensitive to the toolchain used.

--bw


Am I clear enough? I appreciate if anyone can give me some help!

Sincerely,
Shuai











_______________________________________________
Dyninst-api mailing list
Dyninst-api@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api

[← Prev in Thread] Current Thread [Next in Thread→]