Hi Bart -- I've been taking a look at memgaze.
I wanted to chat briefly about my current work on this,
to see if you might have some knowledge or intuition about
what I've discovered before I start a debug hunt on their
very large software stack and dyninst.
They are correct that sometimes dyninst is inserting extra
code and register spill/recovery where they are not needed.
The register spills and recoveries are also accompanied
by a temporary stack area.
Comparing the generated code to the code that dyninst
emits in the source, it appears that in _some_ cases of
instrumentation, that dyninst is emitting a dyninst stack
area, to allow for the "dyninst virtual instrumentation
machine"'s chunk of registers on the stack, as well as saving
processor state, so that it could then emit dyninst instrumentation
to use that facility... and be able to restore the machine state
post-instrumentation.
Those operations might change CPU flags, so that spill/restore
is generated for the "VM" and to maintain a constant CPU state,
uninterrupted by the instrumentation.
However, nothing like that is being generated -- no
dyninst instrumentation, just the instructions rendered by
memgaze to add a ptwrite. My original analysis that perhaps
dyninst is trying to save machine state (flags register)
across instrumentation does seem that it would be a cause
in some cases:
an instruction such as perform comparison and set flags
dyninst allocate stack space via sp
spill rax and flags
instrumentation (ptwrite some register)
dyninst restore rax and flags
dyninst reclaim stack space
jCOND somewhere
However, in other cases it does not do that. Is dyninst performing
dataflow analysis to try and avoid that issue .. even when it
doesn't issue "dyninst virtual machine instructions"?
Most of the problem rewrites appear to be in the c library,
not in the numeric code they are analyzing:
There is another case (this happens quite frequently) when the intel
string instructions are being used (in the c library actually)... and
the same type of code is emitted around them -- even though there is
no context to be saved:
setup rcx, rsi, rdi for a comparison
dyninst allocate stack space via sp
spill rax and flags
instrumentation (add ptwrite on rdi, rsi)
dyninst restore rax and flags
dyninst reclaim stack space
repz cmpsb %es:(%rdi), %ds:(%rsi)
In other cases -- dyninst just adds the instrumentation verbatim
and everything is great. Which is mostly in the numeric code,
not in the c library code, where the unneeded spill/restore happen.
It's almost like the complexity of the branching in the
c library code is affecting dyninst's choices, or the
possible but not actual processor state save mentioned
earlier.
There was also a side issue that the version of objdump they used
disassembled the x86_64 instructions incorrectly, making it appear that
dyninst was emitting instructions incorrectly in the instrumentation.
That is NOT the case, when I disassembled by hand and then
re-verified with xed, dyninst was generating perfect code.
I didn't see anything in dyninst that would generate
invalid code of that nature. I'm checking to see if a
newer version of objdump fixes that bug.
Initially I couldn't build their software, it corrupted my spack
and caused some other problems. I've currently managed to compile
their stack locally, so that I can get a debugger on it and see
what logic dyninst is using in those "odd" cases.
Before I delve into that...
However, I would like to ask you first if you have some
intuition or knowledge on how dyninst chooses to add those
kinds of "dyninst vm spills and restores" to code -- when it
isn't generating code there using it's instrumentation VM.
The difference between the sites is a bit un-systematic,
but perhaps it is and I don't know something about
dyninst's methodology that would help with the analysis.
I've looked at the code, and I'm going to re-read some
of the dyninst papers first -- I though checking in with
you might be the thing to do, as you might have a better
knowledge of that scenario than the papers do.
Bolo -- Josef T. Burger