Branch: refs/heads/bbiiggppiigg/fix-2081
Home: https://github.com/dyninst/dyninst
Commit: ff3b3af96711b853c95225bb67367447422646a2
https://github.com/dyninst/dyninst/commit/ff3b3af96711b853c95225bb67367447422646a2
Author: wuxx1279 <bbiiggppiigg@xxxxxxxxx>
Date: 2026-06-04 (Thu, 04 Jun 2026)
Changed paths:
M dyninstAPI/src/emit-x86.C
Log Message:
-----------
Preserve caller-saved GPRs clobbered by instrumentation (#2081 follow-up)
The base trampoline's register guard decided which registers to save using
intra-procedural liveness (shouldSaveReg in emit-x86.C). A caller-saved GPR
marked "dead" at the instrumentation point was skipped -- correct for a
standard-ABI function, since a caller never keeps a caller-saved scratch
register live across a call.
That assumption is wrong for GCC IPA-SRA local clones (.isra/.constprop),
which co-allocate scratch registers across the call between a clone and its
callers: the caller legitimately keeps e.g. %r11 live across the call because
the clone promises not to touch it. That contract is invisible from the callee,
so dyninst skipped saving %r11 -- and the inserted snippet's call
(FEntryCoverage -> printf) then clobbered it, corrupting the caller's value.
Observed on an instrumented PyTorch libtorch_python.so:
pybind11::detail::string_caster<std::string,false>::load keeps &local in %r11
across a call to std::string::operator=.isra.0; the instrumented operator='s
guard saved only {rsi,rdi}, printf clobbered %r11, and the following
_M_dispose dereferenced 0xffffffff -> SIGSEGV during `import torch`.
Fix: in shouldSaveReg, a caller-saved GPR that the inserted snippet clobbers is
now saved even when intra-procedural liveness marks it dead. Callee-saved
registers are unaffected (the inserted call preserves them by ABI), so they are
not over-saved. No restore-side change is needed: markSavedRegister() marks the
reg spilled, the restore loop pops every spilled reg, and the num_to_save
counting loop uses the same predicate, so saves/pops stay balanced.
Verified: the operator=.isra.0 guard now pushes the full caller-saved set
{rax,r10,r11,r8,r9,rcx,rdx,rsi,rdi}; instrumented `import torch` exits 0 and
produces results identical to the uninstrumented baseline. x86-64 only;
AArch64/PPC base tramps may need the analogous change.
Co-Authored-By: Claude Opus 4.8 <noreply@xxxxxxxxxxxxx>
To unsubscribe from these emails, change your notification settings at https://github.com/dyninst/dyninst/settings/notifications
|