Re: [DynInst_API:] mutateLibcuda segfaults


Date: Fri, 29 May 2020 16:13:44 +0200
From: Ilya Zhukov <i.zhukov@xxxxxxxxxxxxx>
Subject: Re: [DynInst_API:] mutateLibcuda segfaults
Hi Ben,

sorry for a long silence. The silence doesn't mean everything went
successfully but was caused by nonavailability of our systems. We had a
security incident and all our systems went offline two weeks ago. Next
week I can (hopefully) test mutateLibcuda again.

Thanks for your reply and I'll notify about my results soonish.

Best wishes,
Ilya

On 11.05.20 22:34, Benjamin Welton wrote:
>> Do I need to use a compute node?
> 
> Yes you will need to use a compute node to run the tool, It executes a
> small cuda program to determine the location of the synchronization
> function in libcuda. Without a CUDA capable graphics card, this test
> program will likely exit immediately and would give the error you are
> seeing. I would try running this first on a compute node before doing
> any other debugging.
> 
> I have submitted a bug report on this issue because we should print a
> warning when the tool is run on a system without a CUDA capable graphics
> card instead of failing with a random error
> (Âhttps://github.com/dyninst/tools/issues/15Â;).Â
> 
>> X86 with GCC 8.3.0
> 
> This should be fine in terms of there not being any known issues with
> the tool or Dyninst with GCC 8.3. However, I have CC'd Tim Haines on
> here in case there is some issue with Dyninst and GCC 8.3 that I am not
> aware of.Â
> 
>> What else can go wrong here?
> 
> There should be no issue. As mentioned, the kernel runtime limit was
> very unlikely to apply to your machine but i figured it was worth
> mentioning in case the machine had some really strange setup.
> 
> Ben
> 
> 
> 
> 
> On Mon, May 11, 2020 at 2:52 PM Ilya Zhukov <i.zhukov@xxxxxxxxxxxxx
> <mailto:i.zhukov@xxxxxxxxxxxxx>> wrote:
> 
>     Hello Ben and Nisarg,
> 
>     thank you for your help.
> 
>     > This test program is rewritten by the tool (using dyninst) and
>     executed. Was there a core file that was created for a program
>     called hang_devsync?I do not have any core file for "hang_devsync".
> 
>     > In any case there are three likely causes of this test program
>     crashing: 1) injecting the wrong libcuda.so into the test program.
>     This can occur if a parallel file system is in use and it contains a
>     libcuda that differs from the driver version in use by a compute
>     node (note: despite it's name, libcuda is not part of the CUDA
>     toolkit, it is part of the GPU driver package itself). Check to make
>     sure the libcuda the tool is detecting and injecting into the
>     program matches the libcuda version applications run on the node
>     actually use (simplest way to check this is to manually run
>     hang_devsync on the computer node under GDB and check using info
>     shared what libcuda was dlopen'd by libcudart, this path should
>     match what was displayed by the tool in it's log).
>     In both cases I use the same library. My installation was on the login
>     nodes where I do not have GPUs. Do I need to use a compute node?
> 
>     > 2) Dyninst instrumentation error. What platform (x86,PPC, etc) are
>     you using this tool on?Â
>     x86. I use JUWELS [1].
>     > What version of Dyninst are you using?v10.1.0-41-g194dda7
>     > What version of GCC/Clang is being used for compilation of Dyninst?
>     GCC 8.3.0
>     (cmake/make logs in attach)
> 
>     > 3) (unlikely given that you appear to be running on a cluster) as
>     Nisarg mentioned, there is a timeout for cuda kernels that run
>     longer than 5 second on machines that are using the Nvidia card as a
>     display adapter. This is a problem for the test program which spin
>     locks on a single kernel for a long time. You can test if this is an
>     issue by directly launching hang_devsync and seeing if it exits
>     (this program will never return if it is working
>     correctly)."hang_devsync" exits immediately when I execute it. And
>     our GPU experts
>     say that there is no such thing as a kernel runtime limit on JUWELS.
>     What else can go wrong here?
> 
>     Thanks,
>     Ilya
> 
>     [1]
>     https://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUWELS/Configuration/Configuration_node.html
> 
>     On 11.05.20 16:33, Benjamin Welton wrote:
>     > Hello llya,Â
>     >
>     > As Nisarg mentioned, the likely issue here is that the test
>     program that
>     > is launched to determine the location of the internal synchronization
>     > function (hang_devsync) did not complete (most likely it crashed).Â
>     >
>     > This test program is rewritten by the tool (using dyninst) and
>     executed.
>     > Was there a core file that was created for a program called
>     hang_devsync?/Â
>     >
>     > In any case there are three likely causes of this test program
>     crashing:
>     > 1) injecting the wrong libcuda.so into the test program. This can
>     occur
>     > if a parallel file system is in use and it contains a libcuda that
>     > differs from the driver version in use by a compute node (note:
>     despite
>     > it's name, libcuda is not part of the CUDA toolkit, it is part of the
>     > GPU driver package itself). Check to make sure the libcuda the tool is
>     > detecting and injecting into the program matches the libcuda version
>     > applications run on the node actually use (simplest way to check
>     this is
>     > to manually run hang_devsync on the computer node under GDB and check
>     > using info shared what libcuda was dlopen'd by libcudart, this path
>     > should match what was displayed by the tool in it's log).
>     >
>     > 2) Dyninst instrumentation error. What platform (x86,PPC, etc) are you
>     > using this tool on? What version of Dyninst are you using? What
>     version
>     > of GCC/Clang is being used for compilation of Dyninst?
>     >
>     > 3) (unlikely given that you appear to be running on a cluster) as
>     Nisarg
>     > mentioned, there is a timeout for cuda kernels that run longer than 5
>     > second on machines that are using the Nvidia card as a display
>     adapter.
>     > This is a problem for the test program which spin locks on a single
>     > kernel for a long time. You can test if this is an issue by directly
>     > launching hang_devsync and seeing if it exits (this program will never
>     > return if it is working correctly).
>     >
>     > Ben
>     >
>     > On Mon, May 11, 2020, 12:21 AM NISARG SHAH <nisargs@xxxxxxxxxxx
>     <mailto:nisargs@xxxxxxxxxxx>
>     > <mailto:nisargs@xxxxxxxxxxx <mailto:nisargs@xxxxxxxxxxx>>> wrote:
>     >
>     >Â Â ÂThanks Ilya!
>     >
>     >Â Â ÂIt looks like the instrumentation that figures out synchronization
>     >Â Â Âfunction in CUDA did not run completely to the end (it takes
>     around
>     >Â Â Â20-30 minutes to finish).
>     >
>     >Â Â ÂDo you know if the segfault occurs immediately (within 4-5s) after
>     >Â Â Âthe last line is printed to screen ("Inserting signal start instra
>     >Â Â Âin main")? If this is so, the cause of error might be CUDA's
>     kernel
>     >Â Â Âruntime limit. You might need to increase or disable it
>     altogether.
>     >
>     >
>     >Â Â ÂRegards
>     >Â Â ÂNisarg
>     >
>     >Â Â
>     Â------------------------------------------------------------------------
>     >Â Â Â*From:* Ilya Zhukov
>     >Â Â Â*Sent:* Sunday, May 10, 2020 4:52 AM
>     >Â Â Â*To:* NISARG SHAH; dyninst-api@xxxxxxxxxxx
>     <mailto:dyninst-api@xxxxxxxxxxx>
>     >Â Â Â<mailto:dyninst-api@xxxxxxxxxxx <mailto:dyninst-api@xxxxxxxxxxx>>
>     >Â Â Â*Subject:* Re: [DynInst_API:] mutateLibcuda segfaults
>     >
>     >Â Â ÂHi Nisarg,
>     >
>     >Â Â ÂI do not have "MS_outputids.bin" directory but I have 5 *.dot
>     files in
>     >Â Â Âthe directory I ran the program.
>     >
>     >Â Â ÂCheers,
>     >Â Â ÂIlya
>     >
>     >Â Â ÂOn 09.05.20 00:15, NISARG SHAH wrote:
>     >Â Â Â> Hi Ilya,
>     >Â Â Â>
>     >Â Â Â> From the backtrace, it looks like the error is due to the
>     program not
>     >Â Â Â> being able to read from a temporary file "MS_outputids.bin"
>     that is
>     >Â Â Â> creates initially. Can you check if it exists in the
>     directory from
>     >Â Â Â> where you ran the program? Also, can you check if 5 *.dot
>     files are
>     >Â Â Â> present in the same directory?
>     >Â Â Â>
>     >Â Â Â> Thanks
>     >Â Â Â> Nisarg
>     >Â Â Â>
>     >Â Â Â>
>     ------------------------------------------------------------------------
>     >Â Â Â> *From:* Dyninst-api <dyninst-api-bounces@xxxxxxxxxxx
>     <mailto:dyninst-api-bounces@xxxxxxxxxxx>
>     >Â Â Â<mailto:dyninst-api-bounces@xxxxxxxxxxx
>     <mailto:dyninst-api-bounces@xxxxxxxxxxx>>> on behalf of Ilya
>     >Â Â Â> Zhukov <i.zhukov@xxxxxxxxxxxxx
>     <mailto:i.zhukov@xxxxxxxxxxxxx> <mailto:i.zhukov@xxxxxxxxxxxxx
>     <mailto:i.zhukov@xxxxxxxxxxxxx>>>
>     >Â Â Â> *Sent:* Wednesday, May 6, 2020 7:16 AM
>     >Â Â Â> *To:* dyninst-api@xxxxxxxxxxx
>     <mailto:dyninst-api@xxxxxxxxxxx> <mailto:dyninst-api@xxxxxxxxxxx
>     <mailto:dyninst-api@xxxxxxxxxxx>>
>     >Â Â Â<dyninst-api@xxxxxxxxxxx <mailto:dyninst-api@xxxxxxxxxxx>
>     <mailto:dyninst-api@xxxxxxxxxxx <mailto:dyninst-api@xxxxxxxxxxx>>>
>     >Â Â Â> *Subject:* [DynInst_API:] mutateLibcuda segfaults
>     >Â Â Â> Â
>     >Â Â Â> Dear dyinst developers,
>     >Â Â Â>
>     >Â Â Â> I'm testing your cuda_sync_analyze tool on our cluster for
>     CUDA/10.1.105. <http://10.1.105.> <http://10.1.105.>
>     >Â Â Â>
>     >Â Â Â> I installed dyinst and cuda_sync_analyze (cmake and make
>     logs in attach)
>     >Â Â Â> successfully. But I get segmentation fault when I create
>     fake CUDA library.
>     >Â Â Â>
>     >Â Â Â> Here is a backtrace
>     >Â Â Â>> #0Â 0x00002b0a9658c4bc in fseek () from /usr/lib64/libc.so.6
>     >Â Â Â>> #1Â 0x00002b0a93b7eb29 in
>     LaunchIdentifySync::PostProcessing (this=this@entry=0x7fff1af88af0,
>     allFound=...) at
>     /p/project/cslts/zhukov1/work/tools/dyninst/tools/cuda_sync_analyzer/src/LaunchIdentifySync.cpp:90
>     >Â Â Â>> #2Â 0x00002b0a93b7c00f in
>     CSA_FindSyncAddress(std::__cxx11::basic_string<char,
>     std::char_traits<char>, std::allocator<char> >&) () at
>     /p/project/cslts/zhukov1/work/tools/dyninst/tools/cuda_sync_analyzer/src/FindCudaSync.cpp:34
>     >Â Â Â>> #3Â 0x00000000004021fb in main () at
>     /p/project/cslts/zhukov1/work/tools/dyninst/tools/cuda_sync_analyzer/src/main.cpp:15
>     >Â Â Â>> #4Â 0x00002b0a96537505 in __libc_start_main () from
>     /usr/lib64/libc.so.6
>     >Â Â Â>> #5Â 0x000000000040253e in _start () at
>     /p/project/cslts/zhukov1/work/tools/dyninst/tools/cuda_sync_analyzer/src/main.cpp:38
>     >Â Â Â>
>     >Â Â Â> Any help will be appreciated. If you need anything else let
>     me know.
>     >Â Â Â>
>     >Â Â Â> Best wishes,
>     >Â Â Â> Ilya
>     >Â Â Â> --
>     >Â Â Â> Ilya Zhukov
>     >Â Â Â> Juelich Supercomputing Centre
>     >Â Â Â> Institute for Advanced Simulation
>     >Â Â Â> Forschungszentrum Juelich GmbH
>     >Â Â Â> 52425 Juelich, Germany
>     >Â Â Â>
>     >Â Â Â> Phone: +49-2461-61-2054
>     >Â Â Â> Fax: +49-2461-61-2810
>     >Â Â Â> E-mail: i.zhukov@xxxxxxxxxxxxx
>     <mailto:i.zhukov@xxxxxxxxxxxxx> <mailto:i.zhukov@xxxxxxxxxxxxx
>     <mailto:i.zhukov@xxxxxxxxxxxxx>>
>     >Â Â Â> WWW: http://www.fz-juelich.de/jsc
>     >
> 

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

[← Prev in Thread] Current Thread [Next in Thread→]