Branch: refs/heads/bbiiggppiigg/amdgpu-func-call-protoype
Home: https://github.com/dyninst/dyninst
Commit: c31ce1861c16b0e60f603d08ef6478e47f1da264
https://github.com/dyninst/dyninst/commit/c31ce1861c16b0e60f603d08ef6478e47f1da264
Author: wuxx1279 <bbiiggppiigg@xxxxxxxxx>
Date: 2026-06-09 (Tue, 09 Jun 2026)
Changed paths:
A AMDGPU_CALL_SUPPORT_NOTES.md
M dyninstAPI/src/BPatch/BPatch_image.C
M dyninstAPI/src/addressSpace.C
M dyninstAPI/src/emit-amdgpu.C
M dyninstAPI/src/emit-amdgpu.h
M dyninstAPI/src/inst-amdgpu.C
M dyninstAPI/src/parse_func.C
M dyninstAPI/src/trampolines/baseTramp.C
M symtabAPI/CMakeLists.txt
M symtabAPI/src/Symtab-lookup.C
M symtabAPI/src/Symtab.C
A symtabAPI/src/relocationEntry-elf-amdgpu.C
Log Message:
-----------
AMDGPU: inter-module emitCall via GOT slot + S_SWAPPC, loader-aware reloc
Enable EmitterAmdgpuGfx908::emitCall for the inter-module (external) case:
a snippet that calls a function defined in another code object now emits a
per-callee 8-byte slot, a dynamic relocation against the callee, and a
PC-relative load + S_SWAPPC_B64 indirect call.
What this does:
- inst-amdgpu.C: implement getInterModuleFuncAddr (slot via inferiorMalloc +
addDependentRelocation), mirroring the x86/aarch64 GOT pattern.
- emit-amdgpu.{h,C}: emitIndirectCall computes the slot address PC-relative
(S_GETPC_B64 + 64-bit add of the link-time delta, same idiom as
emitLongJump) â NOT an absolute immediate, which would be wrong for a
position-independent code object â then S_LOAD_DWORDX2 + S_SWAPPC_B64.
emitCall wires the external path; clobberAllFuncCall added (conservative).
- baseTramp.C: guarded() returns false on AMDGPU. The recursion guard expands
to DYNINST_lock/unlock_tramp_guard() (host-only runtime) and its If-condition
needs a value-returning call, which AMDGPU emitCall does not provide; without
this, any func-call snippet aborted at operatorAST.C:273 "returned register
invalid".
- symtabAPI Symtab.C addExternalSymbolReference: on AMDGPU, force the
relocation placeholder to ST_NOTYPE and retarget it to ".dyninst.<callee>"
(both .dynsym and .symtab, plus the reloc name). Verified against ROCr loader
source: ApplyDynamicRelocation only name-resolves STT_NOTYPE symbols against
agent_symbols_, and PullElf only registers STT_OBJECT/kernels â so an
STT_FUNC reloc resolves locally to a segment base, not the callee. Pairs with
an external build step that exports STT_OBJECT ".dyninst.<callee>" aliases.
- relocationEntry-elf-amdgpu.C (new) + CMake: R_AMDGPU_* constants and
getGlobalRelType (R_AMDGPU_ABS64) since glibc elf.h lacks them.
What is NOT done (see AMDGPU_CALL_SUPPORT_NOTES.md):
- Intra-module direct call: still asserts (deferred).
- Calling convention: arguments and return value not pinned; emitCall asserts
on non-empty operands and returns Null_Register. clobberAllFuncCall is
maximally conservative as a result.
- Live-process (non-BinaryEdit) indirect call: asserts.
- End-to-end load-time resolution not yet confirmed on hardware: the slot must
receive the callee entry (not a pointer-to-pointer), the provider and
consumer must load into one hsa_executable_t, and which symbol table
PullElf::getSymbolTable() reads (.dynsym vs .symtab) is untraced â the
.symtab mirror is currently a hedge.
- gfx90a/gfx940 gated alongside gfx908 but untested.
Also includes temporary debug fprintf tracing in BPatch_image.C,
addressSpace.C, parse_func.C, Symtab-lookup.C from the instrumentability
investigation (to be removed before merge).
Co-Authored-By: Claude Opus 4.8 <noreply@xxxxxxxxxxxxx>
To unsubscribe from these emails, change your notification settings at https://github.com/dyninst/dyninst/settings/notifications
|