Re: [DynInst_API:] memgaze issues


Date: Mon, 18 Sep 2023 15:39:06 -0500
From: Barton Miller <bart@xxxxxxxxxxx>
Subject: Re: [DynInst_API:] memgaze issues
Hi Bolo.

This should be a larger group discussion so I am cc'ing the group. I'd like to bring the collective expertise on this one.
Everyone: I have a conflict this Weds morning so propose we move the 
meeting time to 1pm on Weds.
--bart

On 9/18/2023 11:57 AM, Bolo wrote:
Hi Bart -- I've been taking a  look at memgaze.

I wanted to chat briefly about my current work on this,
to see if you might have some knowledge or intuition about
what I've discovered before I start a debug hunt on their
very large software stack and dyninst.


They are correct that sometimes dyninst is inserting extra
code and register spill/recovery where they are not needed.

The register spills and recoveries are also accompanied
by a temporary stack area.

Comparing the generated code to the code that dyninst
emits in the source, it appears that in _some_ cases of
instrumentation, that dyninst is emitting a dyninst stack
area, to allow for the "dyninst virtual instrumentation
machine"'s chunk of registers on the stack, as well as saving
processor state, so that it could then emit dyninst instrumentation
to use that facility... and be able to restore the machine state
post-instrumentation.

	Those operations might change CPU flags, so that spill/restore
	is generated for the "VM" and to maintain a constant CPU state,
	uninterrupted by the instrumentation.

However, nothing like that is being generated --  no
dyninst instrumentation, just the instructions rendered by
memgaze to add a ptwrite.  My original analysis that perhaps
dyninst is trying to save machine state (flags register)
across instrumentation does seem that it would be a cause
in some cases:

	an instruction such as perform comparison and set flags
		dyninst allocate  stack space via sp
		spill rax and flags
		instrumentation (ptwrite some register)
		dyninst restore rax and flags
		dyninst reclaim  stack space
	jCOND somewhere

However, in other cases it does not do that.   Is dyninst performing
dataflow analysis to try and avoid that issue .. even when it
doesn't issue "dyninst virtual machine instructions"?

Most of the problem rewrites appear to be in the c library,
not in the numeric code they are analyzing:

There is another case (this happens quite frequently) when the intel
string instructions are being used (in the c library actually)... and
the same type of code is emitted around them -- even though there is
no context to be saved:

	setup rcx, rsi, rdi  for  a comparison
		dyninst allocate  stack space via sp
		spill rax and flags
		instrumentation (add ptwrite on rdi, rsi)
		dyninst restore rax and flags
		dyninst reclaim  stack space
	repz cmpsb %es:(%rdi), %ds:(%rsi)

In other cases -- dyninst just adds the instrumentation verbatim
and everything is great.  Which is mostly in the numeric code,
not in the c library code, where the unneeded spill/restore happen.

	It's almost like the complexity of the branching in the
	c library code is affecting dyninst's choices, or the
	possible but not actual processor state save mentioned
	earlier.


There was also a side issue that the version of objdump they used
disassembled the x86_64 instructions incorrectly, making it appear that
dyninst was emitting instructions incorrectly in the instrumentation.
	
	That is NOT the case, when I disassembled by hand and then
	re-verified with xed, dyninst was generating perfect code.
	I didn't see anything in dyninst that would generate
	invalid code of that nature.   I'm checking to see if a
	newer version of objdump fixes that bug.



Initially I couldn't build their software, it corrupted my spack
and caused some other problems.   I've currently managed to compile
their stack locally, so that I can get a debugger on it and see
what logic dyninst is using in those "odd" cases.

Before I delve into that...

However, I would like to ask you first if you have some
intuition or knowledge on how dyninst chooses to add those
kinds of "dyninst vm spills and  restores" to code -- when it
isn't generating code there using it's instrumentation VM.

	The difference between the sites is a bit un-systematic,
	but perhaps it is and I don't know something about
	dyninst's methodology that would help with the analysis.

I've looked at the code, and I'm going to re-read some
of the dyninst papers first -- I though checking in with
you might be the thing to do, as you might have a better
knowledge of that scenario than the papers do.

Bolo -- Josef T. Burger
[← Prev in Thread] Current Thread [Next in Thread→]
  • Re: [DynInst_API:] memgaze issues, Barton Miller <=