[DynInst_API:] [dyninst/dyninst] 2bfc47: Fix failing to write instrumented shared library (...


Date: Thu, 11 Jun 2026 01:10:08 -0700
From: bbiiggppiigg <noreply@xxxxxxxxxx>
Subject: [DynInst_API:] [dyninst/dyninst] 2bfc47: Fix failing to write instrumented shared library (...
  Branch: refs/heads/bbiiggppiigg/speedup-pytorch
  Home:   https://github.com/dyninst/dyninst
  Commit: 2bfc47d9f9679634dd975eba810a1e8682e9534a
      https://github.com/dyninst/dyninst/commit/2bfc47d9f9679634dd975eba810a1e8682e9534a
  Author: wuxx1279 <bbiiggppiigg@xxxxxxxxx>
  Date:   2026-06-10 (Wed, 10 Jun 2026)

  Changed paths:
    M symtabAPI/src/emitElf.C
    M symtabAPI/src/emitElf.h

  Log Message:
  -----------
  Fix failing to write instrumented shared library (#2081)

Rewriting some shared libraries (e.g. libtorch_python.so) produced a
broken or unloadable output. Two assumptions in the ELF emitter break
on binaries whose program/section layout is unusual; both must be fixed.

1. Insertion point for new loadable sections.
   findSegmentEnds() computed dataSegEnd as max(p_vaddr + p_memsz) over
   all PT_LOAD segments, and driver() only appended the new instrumentation
   sections when some section satisfied sh_addr + sh_size == dataSegEnd.
   When the highest-addressed loadable segment holds relocated read-only
   metadata (.gnu.hash/.dynstr) and is padded past its last section, no
   section ends on that boundary, so createLoadableSections() was never
   called and the result came out with a zero-sized .dynamic ("object file
   has no dynamic section"). Replace findSegmentEnds() with
   findLastLoadableSec(), which returns the start address of the
   highest-addressed section contained in the last loadable segment; the
   trigger becomes shdr->sh_addr == lastLoadableSecStart. This is
   equivalent to the previous behavior for ordinary binaries (still
   triggers on .bss).

2. Program-header insertion order.
   fixPhdrs() located the slot for the new PT_LOAD by scanning for the
   first LOAD -> non-LOAD transition, assuming PT_LOAD entries are
   contiguous in the program header table. libtorch_python.so has
   GNU_STACK as the second program header (right after the first LOAD),
   so the new highest-vaddr segment was inserted at index 1, leaving the
   PT_LOAD entries unsorted by p_vaddr. glibc's _dl_map_segments relies on
   ascending p_vaddr order and crashes. Insert the new segment before the
   first PT_LOAD whose p_vaddr is greater than newSegmentStart (else append
   at the end), keeping PT_LOAD entries sorted regardless of interspersed
   non-loadable entries.

With both fixes, the rewritten library has a valid .dynamic, a correctly
ordered new loadable segment, and loads under the dynamic linker the same
way the original does; instrumenting and running ordinary binaries is
unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@xxxxxxxxxxxxx>


  Commit: 58f93cbbc0f8cdeb9c57da08f0154bf6a834f06a
      https://github.com/dyninst/dyninst/commit/58f93cbbc0f8cdeb9c57da08f0154bf6a834f06a
  Author: wuxx1279 <bbiiggppiigg@xxxxxxxxx>
  Date:   2026-06-11 (Thu, 11 Jun 2026)

  Changed paths:
    M dyninstAPI/src/Relocation/CodeBuffer.C
    M dyninstAPI/src/Relocation/CodeMover.C
    M dyninstAPI/src/addressSpace.C
    M dyninstAPI/src/debug.C
    M dyninstAPI/src/debug.h
    M dyninstAPI/src/image.C
    M parseAPI/src/Parser-speculative.C

  Log Message:
  -----------
  Add DYNINST_DEBUG_PROGRESS rewrite-progress reporting

Rewriting a very large binary (e.g. a multi-hundred-MB shared library)
spends long stretches in silent loops -- building reloc blocks, applying
transforms, code generation, the emit address-fixpoint, and speculative
gap parsing -- so the tool can look hung for many minutes with no output.

Add a coarse, opt-in progress channel gated on the DYNINST_DEBUG_PROGRESS
environment variable (and the dyn_debug_progress flag), following the
existing debug-channel convention in debug.{h,C}. Each progress line is
prefixed with a wall-clock timestamp so phases can be timed directly from
the log.

Instrumented phases:
  - CodeMover: per-function reloc-block build loop and per-RelocBlock
    codegen loop (periodic counts + "codegen done").
  - addressSpace::relocateInt/generateCode: phase markers (transforms,
    code generation, emit address-fixpoint attempts, patching).
  - CodeBuffer::generate: per-pass / periodic progress over the buffer
    elements in the do/while regeneration fixpoint.
  - image::analyzeImage: brackets around the (single-threaded) speculative
    gap-parse phase per text region.
  - Parser-speculative probabilistic_gap_parsing: self-contained,
    DYNINST_DEBUG_PROGRESS-gated timestamped progress (parseAPI sits below
    dyninstAPI and cannot use its progress channel directly).

No behavior change when the env var is unset.

Co-Authored-By: Claude Opus 4.8 <noreply@xxxxxxxxxxxxx>


  Commit: c248a67f080b902f1a0fb96510498baf44b199cd
      https://github.com/dyninst/dyninst/commit/c248a67f080b902f1a0fb96510498baf44b199cd
  Author: wuxx1279 <bbiiggppiigg@xxxxxxxxx>
  Date:   2026-06-11 (Thu, 11 Jun 2026)

  Changed paths:
    M dyninstAPI/src/mapped_object.C

  Log Message:
  -----------
  Add DYNINST_NO_GAP_PARSE to skip speculative gap parsing

On a very large binary, getProcedures() / image::analyzeImage() spends the
bulk of its time in speculative idiom-matching gap parsing
(CodeObject::parseGaps -> probabilistic_gap_parsing). That phase is
single-threaded and roughly O(functions x gaps) -- getGapRange rebuilds the
full function-extent set on every gap and finalize() runs each iteration --
so on a ~400 MB shared library it can dominate the rewrite by hours, while
the parallel recursive-descent parse that precedes it finishes in seconds.

Gate gap parsing on a new DYNINST_NO_GAP_PARSE environment variable, checked
in mapped_object::createMappedObject before parseImage (covers all create
paths). When set, parseGaps is forced false. Default behavior is unchanged.

Trade-off: functions reachable only via gap heuristics (no symbol,
unreachable from parsed code) are not discovered, which is acceptable for
symbol-rich libraries where the named functions are still parsed by the
normal recursive descent.

Co-Authored-By: Claude Opus 4.8 <noreply@xxxxxxxxxxxxx>


  Commit: dff944794f80b0e54d01e8ff598d6328939206cb
      https://github.com/dyninst/dyninst/commit/dff944794f80b0e54d01e8ff598d6328939206cb
  Author: wuxx1279 <bbiiggppiigg@xxxxxxxxx>
  Date:   2026-06-11 (Thu, 11 Jun 2026)

  Changed paths:
    M dyninstAPI/src/Relocation/CodeBuffer.C
    M dyninstAPI/src/Relocation/CodeMover.C
    M dyninstAPI/src/addressSpace.C
    M dyninstAPI/src/debug.C
    M dyninstAPI/src/debug.h
    M dyninstAPI/src/image.C
    M dyninstAPI/src/mapped_object.C
    M parseAPI/src/Parser-speculative.C

  Log Message:
  -----------
  Integrate emit fix (#2081) + progress reporting + gap-parse gate

Octopus merge of three independent branches for combined testing:
  - fix-2081-emit-insertion-point: PT_LOAD/phdr insertion-point fix (#2081)
  - progress-reporting:            DYNINST_DEBUG_PROGRESS rewrite progress
  - disable-gap-parsing:           DYNINST_NO_GAP_PARSE env gate

The three touch disjoint files (emitElf.{C,h} / progress channel + callers /
mapped_object.C), so they combine without conflict.

Co-Authored-By: Claude Opus 4.8 <noreply@xxxxxxxxxxxxx>


Compare: https://github.com/dyninst/dyninst/compare/2bfc47d9f967%5E...dff944794f80

To unsubscribe from these emails, change your notification settings at https://github.com/dyninst/dyninst/settings/notifications
[← Prev in Thread] Current Thread [Next in Thread→]
  • [DynInst_API:] [dyninst/dyninst] 2bfc47: Fix failing to write instrumented shared library (..., bbiiggppiigg <=