Branch: refs/heads/master
Home: https://github.com/dyninst/dyninst
Commit: ff1e7bca971f4124ca179fbc862eda403475a5fd
https://github.com/dyninst/dyninst/commit/ff1e7bca971f4124ca179fbc862eda403475a5fd
Author: bbiiggppiigg <bbiiggppiigg@xxxxxxxxx>
Date: 2026-06-10 (Wed, 10 Jun 2026)
Changed paths:
M dyninstAPI/src/emit-x86.C
Log Message:
-----------
Preserve caller-saved GPRs clobbered by an inserted instrumentation call (#2288)
* Preserve caller-saved GPRs clobbered by an inserted instrumentation call
The base trampoline's register guard decided which registers to save using
intra-procedural liveness (shouldSaveReg in emit-x86.C). A caller-saved GPR
marked "dead" at the instrumentation point was skipped -- correct for a
standard-ABI function, since a caller never keeps a caller-saved scratch
register live across a call.
That assumption is wrong for GCC local clones (.isra/.constprop/...), which
co-allocate scratch registers across the call between a clone and its callers:
the caller legitimately keeps e.g. %r11 live across the call because the clone
promises not to touch it. That contract is invisible from the callee, so dyninst
skipped saving %r11 -- and the inserted snippet's call (here a coverage reporter
that calls printf) then clobbered it, corrupting the caller's value.
Observed on an instrumented PyTorch libtorch_python.so:
pybind11::detail::string_caster<std::string,false>::load keeps &local in %r11
across a call to std::string::operator=.isra.0; the instrumented operator='s
guard saved only {rsi,rdi}, printf clobbered %r11, and the following
_M_dispose dereferenced 0xffffffff -> SIGSEGV during `import torch`.
Fix: in shouldSaveReg, when the instrumented function is a clone
(SymtabAPI Symbol::isClone -- mangled name carries a GCC clone suffix), a
caller-saved GPR that the inserted snippet clobbers is saved even when
intra-procedural liveness marks it dead. The check is gated on isClone (and
checked first, as it is false for almost all functions) so ordinary functions
keep the liveness optimization -- they cannot have a caller holding a
caller-saved register live across the call. Callee-saved registers are
unaffected (the inserted call preserves them by ABI). No restore-side change is
needed: markSavedRegister() marks the reg spilled, the restore loop pops every
spilled reg, and the num_to_save counting loop uses the same predicate, so
saves/pops stay balanced.
Verified on x86-64: the operator=.isra.0 guard now pushes the full caller-saved
set {rax,r10,r11,r8,r9,rcx,rdx,rsi,rdi}, and instrumented `import torch` exits 0
with results identical to the uninstrumented baseline. (Note: writing an
instrumented libtorch_python.so also requires the separate insertion-point /
program-header emit fix; the run above used a build that included it.)
AArch64/PPC base tramps may need the analogous change.
Co-Authored-By: Claude Opus 4.8 <noreply@xxxxxxxxxxxxx>
To unsubscribe from these emails, change your notification settings at https://github.com/dyninst/dyninst/settings/notifications
|