Re: [DynInst_API:] Dyninst for dynamic analysis


Date: Wed, 13 Jan 2016 10:58:26 -0600
From: Bill Williams <bill@xxxxxxxxxxx>
Subject: Re: [DynInst_API:] Dyninst for dynamic analysis
On 01/13/2016 08:59 AM, svartanov@xxxxxxxxx wrote:
Dear all,

I am working on developing dynamic analysis tool for defect detection in Java programs: bytecode and binary code used by JVM through JNI. 

Previously our team have built Avalanche tool for crash detection in C/C++ programs. It was build on top of Valgrind. Our approach is quite simple. Avalanche runs target program with random input, and extracts full path conditions using dynamic instrumentation. After that tool inverts one of the branch conditions, generates new input data, and repeats these actions for other execution paths.

To this purpose we have to instrument every significant (almost every) instruction. And Valgrind is pretty good for this.

Dealing with Java, I have modified Avian virtual machine to extract path conditions trace for Java during bytecode interpretation. But now I want to extract the same path conditions trace from native code execution (native libraries used through JNI).

I can use Valgrind for this, but I must run whole JVM on top of Valgrind just for native functions instrumentation. I will get too much overhead.

I am impressed by Dyninst ability to perform static instrumentation and code inserting into a running program. I wrote a couple of simple Dyninst tools for instrumentation but faced a problem with instrumentation points. I can't find a way to set instrumentation point for every instruction in target program. Is it possible to do this? If not in current Dyninst version, may be in further versions?

It's possible, though not ideal--you can use arbitrary instpoints to instrument pre- or post- instruction by address. The overhead should be better than valgrind but it'll still be high. The lookup function you want is BPatch_image::findPoints(Address, vector<BPatch_point*>&).
Or may be you know some tricks to extract path conditions trace without instrumentation every instruction. As I understand, every basic block start or even every memory access instrumentation points are insufficient for full taint data tracking and path condition trace dumping even if I know everything about static instruction list in basic block, because I must know, for example, register values on-the-fly.

The best trick I can suggest is to analyze each basic block statically such that you can determine (e.g. via slicing and reaching definitions) which instructions are relevant to your taint analysis and path condition analysis, and at what points in the block the information you seek is valid. That will allow you to consolidate the instrumentation into fewer points, and possibly reuse values or otherwise reduce the workload.

We've had some conversations over the years about how to automate the above process, so that Dyninst could automatically transform instrumentation into its most efficient equivalent form, but those have not turned into code--it's a hard problem in the general case.

--bw

Thank you.

Best regards,
Sergey Vartanov.



_______________________________________________
Dyninst-api mailing list
Dyninst-api@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api

[← Prev in Thread] Current Thread [Next in Thread→]