[DynInst_API:] [dyninst/dyninst] c31ce1: AMDGPU: inter-module emitCall via GOT slot + S_SWA...


Date: Tue, 09 Jun 2026 19:58:55 -0700
From: bbiiggppiigg <noreply@xxxxxxxxxx>
Subject: [DynInst_API:] [dyninst/dyninst] c31ce1: AMDGPU: inter-module emitCall via GOT slot + S_SWA...
  Branch: refs/heads/bbiiggppiigg/amdgpu-func-call-protoype
  Home:   https://github.com/dyninst/dyninst
  Commit: c31ce1861c16b0e60f603d08ef6478e47f1da264
      https://github.com/dyninst/dyninst/commit/c31ce1861c16b0e60f603d08ef6478e47f1da264
  Author: wuxx1279 <bbiiggppiigg@xxxxxxxxx>
  Date:   2026-06-09 (Tue, 09 Jun 2026)

  Changed paths:
    A AMDGPU_CALL_SUPPORT_NOTES.md
    M dyninstAPI/src/BPatch/BPatch_image.C
    M dyninstAPI/src/addressSpace.C
    M dyninstAPI/src/emit-amdgpu.C
    M dyninstAPI/src/emit-amdgpu.h
    M dyninstAPI/src/inst-amdgpu.C
    M dyninstAPI/src/parse_func.C
    M dyninstAPI/src/trampolines/baseTramp.C
    M symtabAPI/CMakeLists.txt
    M symtabAPI/src/Symtab-lookup.C
    M symtabAPI/src/Symtab.C
    A symtabAPI/src/relocationEntry-elf-amdgpu.C

  Log Message:
  -----------
  AMDGPU: inter-module emitCall via GOT slot + S_SWAPPC, loader-aware reloc

Enable EmitterAmdgpuGfx908::emitCall for the inter-module (external) case:
a snippet that calls a function defined in another code object now emits a
per-callee 8-byte slot, a dynamic relocation against the callee, and a
PC-relative load + S_SWAPPC_B64 indirect call.

What this does:

- inst-amdgpu.C: implement getInterModuleFuncAddr (slot via inferiorMalloc +
  addDependentRelocation), mirroring the x86/aarch64 GOT pattern.
- emit-amdgpu.{h,C}: emitIndirectCall computes the slot address PC-relative
  (S_GETPC_B64 + 64-bit add of the link-time delta, same idiom as
  emitLongJump) â NOT an absolute immediate, which would be wrong for a
  position-independent code object â then S_LOAD_DWORDX2 + S_SWAPPC_B64.
  emitCall wires the external path; clobberAllFuncCall added (conservative).
- baseTramp.C: guarded() returns false on AMDGPU. The recursion guard expands
  to DYNINST_lock/unlock_tramp_guard() (host-only runtime) and its If-condition
  needs a value-returning call, which AMDGPU emitCall does not provide; without
  this, any func-call snippet aborted at operatorAST.C:273 "returned register
  invalid".
- symtabAPI Symtab.C addExternalSymbolReference: on AMDGPU, force the
  relocation placeholder to ST_NOTYPE and retarget it to ".dyninst.<callee>"
  (both .dynsym and .symtab, plus the reloc name). Verified against ROCr loader
  source: ApplyDynamicRelocation only name-resolves STT_NOTYPE symbols against
  agent_symbols_, and PullElf only registers STT_OBJECT/kernels â so an
  STT_FUNC reloc resolves locally to a segment base, not the callee. Pairs with
  an external build step that exports STT_OBJECT ".dyninst.<callee>" aliases.
- relocationEntry-elf-amdgpu.C (new) + CMake: R_AMDGPU_* constants and
  getGlobalRelType (R_AMDGPU_ABS64) since glibc elf.h lacks them.

What is NOT done (see AMDGPU_CALL_SUPPORT_NOTES.md):

- Intra-module direct call: still asserts (deferred).
- Calling convention: arguments and return value not pinned; emitCall asserts
  on non-empty operands and returns Null_Register. clobberAllFuncCall is
  maximally conservative as a result.
- Live-process (non-BinaryEdit) indirect call: asserts.
- End-to-end load-time resolution not yet confirmed on hardware: the slot must
  receive the callee entry (not a pointer-to-pointer), the provider and
  consumer must load into one hsa_executable_t, and which symbol table
  PullElf::getSymbolTable() reads (.dynsym vs .symtab) is untraced â the
  .symtab mirror is currently a hedge.
- gfx90a/gfx940 gated alongside gfx908 but untested.

Also includes temporary debug fprintf tracing in BPatch_image.C,
addressSpace.C, parse_func.C, Symtab-lookup.C from the instrumentability
investigation (to be removed before merge).

Co-Authored-By: Claude Opus 4.8 <noreply@xxxxxxxxxxxxx>



To unsubscribe from these emails, change your notification settings at https://github.com/dyninst/dyninst/settings/notifications

[← Prev in Thread] Current Thread [Next in Thread→]
  • [DynInst_API:] [dyninst/dyninst] c31ce1: AMDGPU: inter-module emitCall via GOT slot + S_SWA..., bbiiggppiigg <=