Branch: refs/heads/bbiiggppiigg/speedup-pytorch
Home: https://github.com/dyninst/dyninst
Commit: 2bfc47d9f9679634dd975eba810a1e8682e9534a
https://github.com/dyninst/dyninst/commit/2bfc47d9f9679634dd975eba810a1e8682e9534a
Author: wuxx1279 <bbiiggppiigg@xxxxxxxxx>
Date: 2026-06-10 (Wed, 10 Jun 2026)
Changed paths:
M symtabAPI/src/emitElf.C
M symtabAPI/src/emitElf.h
Log Message:
-----------
Fix failing to write instrumented shared library (#2081)
Rewriting some shared libraries (e.g. libtorch_python.so) produced a
broken or unloadable output. Two assumptions in the ELF emitter break
on binaries whose program/section layout is unusual; both must be fixed.
1. Insertion point for new loadable sections.
findSegmentEnds() computed dataSegEnd as max(p_vaddr + p_memsz) over
all PT_LOAD segments, and driver() only appended the new instrumentation
sections when some section satisfied sh_addr + sh_size == dataSegEnd.
When the highest-addressed loadable segment holds relocated read-only
metadata (.gnu.hash/.dynstr) and is padded past its last section, no
section ends on that boundary, so createLoadableSections() was never
called and the result came out with a zero-sized .dynamic ("object file
has no dynamic section"). Replace findSegmentEnds() with
findLastLoadableSec(), which returns the start address of the
highest-addressed section contained in the last loadable segment; the
trigger becomes shdr->sh_addr == lastLoadableSecStart. This is
equivalent to the previous behavior for ordinary binaries (still
triggers on .bss).
2. Program-header insertion order.
fixPhdrs() located the slot for the new PT_LOAD by scanning for the
first LOAD -> non-LOAD transition, assuming PT_LOAD entries are
contiguous in the program header table. libtorch_python.so has
GNU_STACK as the second program header (right after the first LOAD),
so the new highest-vaddr segment was inserted at index 1, leaving the
PT_LOAD entries unsorted by p_vaddr. glibc's _dl_map_segments relies on
ascending p_vaddr order and crashes. Insert the new segment before the
first PT_LOAD whose p_vaddr is greater than newSegmentStart (else append
at the end), keeping PT_LOAD entries sorted regardless of interspersed
non-loadable entries.
With both fixes, the rewritten library has a valid .dynamic, a correctly
ordered new loadable segment, and loads under the dynamic linker the same
way the original does; instrumenting and running ordinary binaries is
unaffected.
Co-Authored-By: Claude Opus 4.8 <noreply@xxxxxxxxxxxxx>
Commit: 58f93cbbc0f8cdeb9c57da08f0154bf6a834f06a
https://github.com/dyninst/dyninst/commit/58f93cbbc0f8cdeb9c57da08f0154bf6a834f06a
Author: wuxx1279 <bbiiggppiigg@xxxxxxxxx>
Date: 2026-06-11 (Thu, 11 Jun 2026)
Changed paths:
M dyninstAPI/src/Relocation/CodeBuffer.C
M dyninstAPI/src/Relocation/CodeMover.C
M dyninstAPI/src/addressSpace.C
M dyninstAPI/src/debug.C
M dyninstAPI/src/debug.h
M dyninstAPI/src/image.C
M parseAPI/src/Parser-speculative.C
Log Message:
-----------
Add DYNINST_DEBUG_PROGRESS rewrite-progress reporting
Rewriting a very large binary (e.g. a multi-hundred-MB shared library)
spends long stretches in silent loops -- building reloc blocks, applying
transforms, code generation, the emit address-fixpoint, and speculative
gap parsing -- so the tool can look hung for many minutes with no output.
Add a coarse, opt-in progress channel gated on the DYNINST_DEBUG_PROGRESS
environment variable (and the dyn_debug_progress flag), following the
existing debug-channel convention in debug.{h,C}. Each progress line is
prefixed with a wall-clock timestamp so phases can be timed directly from
the log.
Instrumented phases:
- CodeMover: per-function reloc-block build loop and per-RelocBlock
codegen loop (periodic counts + "codegen done").
- addressSpace::relocateInt/generateCode: phase markers (transforms,
code generation, emit address-fixpoint attempts, patching).
- CodeBuffer::generate: per-pass / periodic progress over the buffer
elements in the do/while regeneration fixpoint.
- image::analyzeImage: brackets around the (single-threaded) speculative
gap-parse phase per text region.
- Parser-speculative probabilistic_gap_parsing: self-contained,
DYNINST_DEBUG_PROGRESS-gated timestamped progress (parseAPI sits below
dyninstAPI and cannot use its progress channel directly).
No behavior change when the env var is unset.
Co-Authored-By: Claude Opus 4.8 <noreply@xxxxxxxxxxxxx>
Commit: c248a67f080b902f1a0fb96510498baf44b199cd
https://github.com/dyninst/dyninst/commit/c248a67f080b902f1a0fb96510498baf44b199cd
Author: wuxx1279 <bbiiggppiigg@xxxxxxxxx>
Date: 2026-06-11 (Thu, 11 Jun 2026)
Changed paths:
M dyninstAPI/src/mapped_object.C
Log Message:
-----------
Add DYNINST_NO_GAP_PARSE to skip speculative gap parsing
On a very large binary, getProcedures() / image::analyzeImage() spends the
bulk of its time in speculative idiom-matching gap parsing
(CodeObject::parseGaps -> probabilistic_gap_parsing). That phase is
single-threaded and roughly O(functions x gaps) -- getGapRange rebuilds the
full function-extent set on every gap and finalize() runs each iteration --
so on a ~400 MB shared library it can dominate the rewrite by hours, while
the parallel recursive-descent parse that precedes it finishes in seconds.
Gate gap parsing on a new DYNINST_NO_GAP_PARSE environment variable, checked
in mapped_object::createMappedObject before parseImage (covers all create
paths). When set, parseGaps is forced false. Default behavior is unchanged.
Trade-off: functions reachable only via gap heuristics (no symbol,
unreachable from parsed code) are not discovered, which is acceptable for
symbol-rich libraries where the named functions are still parsed by the
normal recursive descent.
Co-Authored-By: Claude Opus 4.8 <noreply@xxxxxxxxxxxxx>
Commit: dff944794f80b0e54d01e8ff598d6328939206cb
https://github.com/dyninst/dyninst/commit/dff944794f80b0e54d01e8ff598d6328939206cb
Author: wuxx1279 <bbiiggppiigg@xxxxxxxxx>
Date: 2026-06-11 (Thu, 11 Jun 2026)
Changed paths:
M dyninstAPI/src/Relocation/CodeBuffer.C
M dyninstAPI/src/Relocation/CodeMover.C
M dyninstAPI/src/addressSpace.C
M dyninstAPI/src/debug.C
M dyninstAPI/src/debug.h
M dyninstAPI/src/image.C
M dyninstAPI/src/mapped_object.C
M parseAPI/src/Parser-speculative.C
Log Message:
-----------
Integrate emit fix (#2081) + progress reporting + gap-parse gate
Octopus merge of three independent branches for combined testing:
- fix-2081-emit-insertion-point: PT_LOAD/phdr insertion-point fix (#2081)
- progress-reporting: DYNINST_DEBUG_PROGRESS rewrite progress
- disable-gap-parsing: DYNINST_NO_GAP_PARSE env gate
The three touch disjoint files (emitElf.{C,h} / progress channel + callers /
mapped_object.C), so they combine without conflict.
Co-Authored-By: Claude Opus 4.8 <noreply@xxxxxxxxxxxxx>
Compare: https://github.com/dyninst/dyninst/compare/2bfc47d9f967%5E...dff944794f80
To unsubscribe from these emails, change your notification settings at https://github.com/dyninst/dyninst/settings/notifications
|