Daniel and Doug,
We did some experimentation with HPCToolkit yesterday. To fix your problems with the analysis of Fortran binaries, you need to install a new âdevelopâ version of HPCToolkit with Dyninst master. The following complete recipe should work for CPU-only codes:
git clone https://github.com/spack/spack
source spack/share/spack/setup-env.sh
spack compiler find
spack install hpctoolkit@develop ^dyninst@master
spack load hpctoolkit
To work with GPU codes requires a bit of fiddling with a spack packages.yaml file to indicate where GPU components can be found, as documented here: https://hpctoolkit.org/software-instructions.html
FYI: The need to change from hpctoolkit@xxxxxxxxx to hpctoolkit@develop is only because the API to Dyninst has changed since the 2024.01.1 snapshot. The older hpctoolkit wonât build with the newer dyninst.
Best,
John
--
John Mellor-Crummey Professor
Dept of Computer Science Rice University
email: johnmc@xxxxxxxx phone: 713-348-5179
> On May 12, 2025, at 1:37âPM, John Mellor-Crummey <johnmc@xxxxxxxx> wrote:
>
> Using Danielâs gfx_model.x binary, I confirmed
> â
> (bad) that hpcstruct in hpctoolkit version 2024.01.1 based on Dyninst 13.0.0 fails with binary
> â (good) the Dyninst problem for analyzing DWARF subrange information from Fortran applications has been fixed in Dyninst master.
>
>
> Unfortunately, Dyninst master is not usable with the HPCToolkit 2024.01.1 release. However, the updated version of Dyninst is usable with HPCToolkitâs develop branch. Unfortunately, the spack recipe for deploying our develop branch seems to be missing a few library paths that donât get baked in by spack. I will report back to this list when we have fixed HPCToolkit's spack recipe so you can use our develop branch.
>
> Best,
>
> John
> --
> John Mellor-Crummey Professor
> Dept of Computer Science Rice University
> email: johnmc@xxxxxxxx phone: 713-348-5179
>
>
>
>> On May 12, 2025, at 10:26âAM, Daniel Kokron - NOAA Affiliate <daniel.kokron@xxxxxxxx> wrote:
>>
>> Ahhhh, that explains the following and how to get around it. Thank you.
>>
>> WARNING: Skipping DWARF for gfs_model.x, over threshold (377978416 > 104857600)
>>
>> On Mon, May 12, 2025 at 10:13âAM John Mellor-Crummey <johnmc@xxxxxxxx> wrote:
>> Daniel,
>>
>> One more thing:
>>
>> While we work on resolving the issue with hpcstruct, you should be able to run hpcprof on your measurement data even if hpcstruct failed to analyze this binary. hpcprof includes the ability to read DWARF (using a different library that shouldnât crash).
>>
>> When you run hpcprof, you should use
>>
>> hpcprof --dwarf-max-size=unlimited <measurement directory>
>>
>> Best,
>>
>> John
>> --
>> John Mellor-Crummey Professor
>> Dept of Computer Science Rice University
>> email: johnmc@xxxxxxxx phone: 713-348-5179
>>
>>
>>
>>> On May 12, 2025, at 10:05âAM, Daniel Kokron - NOAA Affiliate <daniel.kokron@xxxxxxxx> wrote:
>>>
>>> Got permission to share the executable. Link sent.
>>>
>>> On Fri, May 9, 2025 at 2:26âPM Daniel Kokron - NOAA Affiliate <daniel.kokron@xxxxxxxx> wrote:
>>> I'll ask about providing the executable.
>>>
>>> On Fri, May 9, 2025 at 1:52âPM John Mellor-Crummey <johnmc@xxxxxxxx> wrote:
>>> Hi Daniel,
>>>
>>> Thanks for the callstack.
>>>
>>> The problem seems to be exactly the same one recently encountered by Doug Pase for a Fortran program at Sandia. This is a problem inside the type processing by the Dyninst software written by our collaborators.
>>>
>>> Can you share a binary with us to facilitate debugging? The Sandia binary is export controlled and only accessible inside their firewall. Having a non-export controlled binary for debugging would make our lives easier.
>>>
>>> Best,
>>>
>>> John
>>> --
>>> John Mellor-Crummey Professor
>>> Dept of Computer Science Rice University
>>> email: johnmc@xxxxxxxx phone: 713-348-5179
>>>
>>>
>>>
>>>> On May 9, 2025, at 12:15âPM, Daniel Kokron - NOAA Affiliate <daniel.kokron@xxxxxxxx> wrote:
>>>>
>>>> The application is compiled with Intel ifort. HPCToolkit and its dependencies are compiled with gcc-13.2.1. I attached the spec for HPCToolkit.
>>>>
>>>>
>>>> (gdb) run --nocache /lfs/h1/hpc/support/daniel.kokron/Tickets/2025042910000034/sorc/ufs_model.fd/build_fv3_1/gfs_model.x
>>>> Starting program: /lfs/h1/hpc/support/daniel.kokron/SPACK/spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/hpctoolkit-2024.01.1-a3im66mlumyu3hbzmeuor3kj3l553yau/bin/hpcstruct --nocache /lfs/h1/hpc/support/daniel.kokron/Tickets/2025042910000034/sorc/ufs_model.fd/build_fv3_1/gfs_model.x
>>>> Missing separate debuginfos, use: zypper install glibc-debuginfo-2.31-150300.63.1.x86_64
>>>> Missing separate debuginfo for /lfs/h1/hpc/support/daniel.kokron/SPACK/spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/gcc-runtime-13.2.1-eo4evuugdi6s23do65dqomvbknlo4ong/lib/libstdc++.so.6
>>>> Try: zypper install -C "debuginfo(build-id)=c74eca671e2dd0f063706372d103f8acef88f1e3"
>>>> Missing separate debuginfo for /lfs/h1/hpc/support/daniel.kokron/SPACK/spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/gcc-runtime-13.2.1-eo4evuugdi6s23do65dqomvbknlo4ong/lib/libgomp.so.1
>>>> Try: zypper install -C "debuginfo(build-id)=54684492738e640bcd600e830cee025dd8771a20"
>>>> Missing separate debuginfo for /lfs/h1/hpc/support/daniel.kokron/SPACK/spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/gcc-runtime-13.2.1-eo4evuugdi6s23do65dqomvbknlo4ong/lib/libgcc_s.so.1
>>>> Try: zypper install -C "debuginfo(build-id)=12f775ec4aeb94b749897b1b65638f18b61d1b1f"
>>>> [Thread debugging using libthread_db enabled]
>>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>>> begin sequential analysis of CPU binary /lfs/h1/hpc/support/daniel.kokron/Tickets/2025042910000034/sorc/ufs_model.fd/build_fv3_1/gfs_model.x (size = 377978672, threads = 1)
>>>> hpcstruct: /lfs/h1/hpc/support/daniel.kokron/SPACK/spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/boost-1.87.0-2cldxfpwec5rbbhxutja5lwcgzh6fbhc/include/boost/smart_ptr/shared_ptr.hpp:550: typename boost::detail::sp_member_access<T>::type boost::shared_ptr<T>::operator->() const [with T = Dyninst::SymtabAPI::typeSubrange; typename boost::detail::sp_member_access<T>::type = Dyninst::SymtabAPI::typeSubrange*]: Assertion `px != 0' failed.
>>>>
>>>> Program received signal SIGABRT, Aborted.
>>>> 0x0000155553e2fd2b in raise () from /lib64/libc.so.6
>>>> (gdb) where
>>>> #0 0x0000155553e2fd2b in raise () from /lib64/libc.so.6
>>>> #1 0x0000155553e313e5 in abort () from /lib64/libc.so.6
>>>> #2 0x0000155553e27c6a in __assert_fail_base () from /lib64/libc.so.6
>>>> #3 0x0000155553e27cf2 in __assert_fail () from /lib64/libc.so.6
>>>> #4 0x0000155554d65127 in boost::enable_if<boost::integral_constant<bool, !((bool)boost::is_same<Dyninst::SymtabAPI::Type, Dyninst::SymtabAPI::typeSubrange>::value)>, boost::shared_ptr<Dyninst::SymtabAPI::Type> >::type Dyninst::SymtabAPI::typeCollection::addOrUpdateType<Dyninst::SymtabAPI::typeSubrange>(boost::shared_ptr<Dyninst::SymtabAPI::typeSubrange>) () from /lfs/h1/hpc/support/daniel.kokron/SPACK/spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/dyninst-13.0.0-74gjdp5yt432nk3wyv7dn7o45ovdw6hr/lib/libsymtabAPI.so.13.0
>>>> #5 0x0000155554d547e6 in Dyninst::SymtabAPI::DwarfWalker::parseSubrange() () from /lfs/h1/hpc/support/daniel.kokron/SPACK/spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/dyninst-13.0.0-74gjdp5yt432nk3wyv7dn7o45ovdw6hr/lib/libsymtabAPI.so.13.0
>>>> #6 0x0000155554d5a0a8 in Dyninst::SymtabAPI::DwarfWalker::parse_int(Dwarf_Die, bool, bool) () from /lfs/h1/hpc/support/daniel.kokron/SPACK/spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/dyninst-13.0.0-74gjdp5yt432nk3wyv7dn7o45ovdw6hr/lib/libsymtabAPI.so.13.0
>>>> #7 0x0000155554d5b235 in Dyninst::SymtabAPI::DwarfWalker::findAnyType(Dwarf_Attribute, bool, boost::shared_ptr<Dyninst::SymtabAPI::Type>&) ()
>>>> from /lfs/h1/hpc/support/daniel.kokron/SPACK/spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/dyninst-13.0.0-74gjdp5yt432nk3wyv7dn7o45ovdw6hr/lib/libsymtabAPI.so.13.0
>>>> #8 0x0000155554d5b732 in Dyninst::SymtabAPI::DwarfWalker::findType(boost::shared_ptr<Dyninst::SymtabAPI::Type>&, bool) ()
>>>> from /lfs/h1/hpc/support/daniel.kokron/SPACK/spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/dyninst-13.0.0-74gjdp5yt432nk3wyv7dn7o45ovdw6hr/lib/libsymtabAPI.so.13.0
>>>> #9 0x0000155554d5497b in Dyninst::SymtabAPI::DwarfWalker::parseArray() () from /lfs/h1/hpc/support/daniel.kokron/SPACK/spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/dyninst-13.0.0-74gjdp5yt432nk3wyv7dn7o45ovdw6hr/lib/libsymtabAPI.so.13.0
>>>> #10 0x0000155554d59fb8 in Dyninst::SymtabAPI::DwarfWalker::parse_int(Dwarf_Die, bool, bool) () from /lfs/h1/hpc/support/daniel.kokron/SPACK/spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/dyninst-13.0.0-74gjdp5yt432nk3wyv7dn7o45ovdw6hr/lib/libsymtabAPI.so.13.0
>>>> #11 0x0000155554d5a53b in Dyninst::SymtabAPI::DwarfWalker::parse_int(Dwarf_Die, bool, bool) () from /lfs/h1/hpc/support/daniel.kokron/SPACK/spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/dyninst-13.0.0-74gjdp5yt432nk3wyv7dn7o45ovdw6hr/lib/libsymtabAPI.so.13.0
>>>> #12 0x0000155554d5bab0 in Dyninst::SymtabAPI::DwarfWalker::parseModule(Dwarf_Die, Dyninst::SymtabAPI::Module*&) [clone .constprop.0] [clone .isra.0] ()
>>>> from /lfs/h1/hpc/support/daniel.kokron/SPACK/spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/dyninst-13.0.0-74gjdp5yt432nk3wyv7dn7o45ovdw6hr/lib/libsymtabAPI.so.13.0
>>>> #13 0x0000155554d5c15c in Dyninst::SymtabAPI::DwarfWalker::parse() [clone ._omp_fn.0] () from /lfs/h1/hpc/support/daniel.kokron/SPACK/spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/dyninst-13.0.0-74gjdp5yt432nk3wyv7dn7o45ovdw6hr/lib/libsymtabAPI.so.13.0
>>>> #14 0x000015555403b306 in GOMP_parallel () from /lfs/h1/hpc/support/daniel.kokron/SPACK/spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/gcc-runtime-13.2.1-eo4evuugdi6s23do65dqomvbknlo4ong/lib/libgomp.so.1
>>>> #15 0x0000155554d5d2ed in Dyninst::SymtabAPI::DwarfWalker::parse() () from /lfs/h1/hpc/support/daniel.kokron/SPACK/spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/dyninst-13.0.0-74gjdp5yt432nk3wyv7dn7o45ovdw6hr/lib/libsymtabAPI.so.13.0
>>>> #16 0x0000155554d0a0c1 in Dyninst::SymtabAPI::Object::parseTypeInfo() () from /lfs/h1/hpc/support/daniel.kokron/SPACK/spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/dyninst-13.0.0-74gjdp5yt432nk3wyv7dn7o45ovdw6hr/lib/libsymtabAPI.so.13.0
>>>> #17 0x0000155554cd48a7 in Dyninst::SymtabAPI::Symtab::parseTypes() () from /lfs/h1/hpc/support/daniel.kokron/SPACK/spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/dyninst-13.0.0-74gjdp5yt432nk3wyv7dn7o45ovdw6hr/lib/libsymtabAPI.so.13.0
>>>> #18 0x0000155553fef5d7 in __pthread_once_slow () from /lib64/libpthread.so.0
>>>> #19 0x0000155554ccc8b4 in Dyninst::SymtabAPI::Symtab::parseTypesNow() () from /lfs/h1/hpc/support/daniel.kokron/SPACK/spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/dyninst-13.0.0-74gjdp5yt432nk3wyv7dn7o45ovdw6hr/lib/libsymtabAPI.so.13.0
>>>> #20 0x00000000004418c8 in Inline::openSymtab (elfFile=elfFile@entry=0x8d94b0) at Struct-Inline.cpp:132
>>>> #21 0x000000000043cb31 in BAnal::Struct::makeStructure (filename=..., outFile=outFile@entry=0x830ab0, gapsFile=gapsFile@entry=0x0, gaps_filenm=..., search_path=..., structOpts=...) at Struct.cpp:770
>>>> #22 0x000000000042cb42 in doSingleBinary (args=..., sb=sb@entry=0x7ffffffd8740) at /usr/include/c++/13/bits/basic_string.tcc:238
>>>> #23 0x0000000000412cfd in realmain (argc=<optimized out>, argv=<optimized out>) at main.cpp:209
>>>> #24 0x000000000041220a in main (argc=<optimized out>, argv=<optimized out>) at main.cpp:137
>>>>
>>>> On Fri, May 9, 2025 at 11:16âAM John Mellor-Crummey <johnmc@xxxxxxxx> wrote:
>>>> Hi Daniel,
>>>>
>>>> You should be able to run hpcstruct under gdb and then run it directly on the offending binary as follows
>>>>
>>>> gdb `which hpcstruct`
>>>> run --nocache /path/to/gfs_model
>>>>
>>>> Then, you can send us a call path. By any chance is this a Fortran code compiled with gfortran? We are presently looking into a complaint about that from Sandia.
>>>>
>>>> Best,
>>>>
>>>> John
>>>> --
>>>> John Mellor-Crummey Professor
>>>> Dept of Computer Science Rice University
>>>> email: johnmc@xxxxxxxx phone: 713-348-5179
>>>>
>>>>
>>>>
>>>>> On May 9, 2025, at 8:59âAM, Daniel Kokron - NOAA Affiliate via HPCToolkit-forum <hpctoolkit-forum@xxxxxxxxxxxxxxxx> wrote:
>>>>>
>>>>> I am encountering the following error while running hpcstruct. I cannot find the core file in any of the usual places. I have also tried running hpcstruct under gdb without getting very far.
>>>>>
>>>>> Wondering what my debugging options are?
>>>>>
>>>>> begin concurrent analysis of CPU binary gfs_model. (size = 377978416, threads = 1)
>>>>> /bin/sh: line 32: 63480 Aborted (core dumped) /spack/opt/spack/linux-sles15-zen2/gcc-13.2.1/hpctoolkit-2024.01.1-a3im66mlumyu3hbzmeuor3kj3l553yau/bin/hpcstruct. --nocache -j 1 -o $struct_name -M $meas_dir /Baseline_6Hr_WithWW3Restarts_Trace.16774.rawdata/cpubins/model.x > $warn_name 2>&1
>>>>>
>>>>> Dan
>>>>> _______________________________________________
>>>>> HPCToolkit-forum mailing list
>>>>> HPCToolkit-forum@xxxxxxxxxxxxxxxx
>>>>> https://mailman.rice.edu/mailman/listinfo/hpctoolkit-forum
>>>>
>>>
>>
>
|