Re: [DynInst_API:] mutateLibcuda segfaults

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Date:	Mon, 11 May 2020 15:34:29 -0500
From:	Benjamin Welton <welton@xxxxxxxxxxx>
Subject:	Re: [DynInst_API:] mutateLibcuda segfaults

> Do I need to use a compute node?

Yes you will need to use a compute node to run the tool, It executes a small cuda program to determine the location of the synchronization function in libcuda. Without a CUDA capable graphics card, this test program will likely exit immediately and would give the error you are seeing. I would try running this first on a compute node before doing any other debugging.

I have submitted a bug report on this issue because we should print a warning when the tool is run on a system without a CUDA capable graphics card instead of failing with a random error ( https://github.com/dyninst/tools/issues/15 ).

> X86 with GCC 8.3.0

This should be fine in terms of there not being any known issues with the tool or Dyninst with GCC 8.3. However, I have CC'd Tim Haines on here in case there is some issue with Dyninst and GCC 8.3 that I am not aware of.

> What else can go wrong here?

There should be no issue. As mentioned, the kernel runtime limit was very unlikely to apply to your machine but i figured it was worth mentioning in case the machine had some really strange setup.

Ben

On Mon, May 11, 2020 at 2:52 PM Ilya Zhukov <i.zhukov@xxxxxxxxxxxxx> wrote:

Hello Ben and Nisarg,

thank you for your help.

> This test program is rewritten by the tool (using dyninst) and executed. Was there a core file that was created for a program called hang_devsync?I do not have any core file for "hang_devsync".

> In any case there are three likely causes of this test program crashing: 1) injecting the wrong libcuda.so into the test program. This can occur if a parallel file system is in use and it contains a libcuda that differs from the driver version in use by a compute node (note: despite it's name, libcuda is not part of the CUDA toolkit, it is part of the GPU driver package itself). Check to make sure the libcuda the tool is detecting and injecting into the program matches the libcuda version applications run on the node actually use (simplest way to check this is to manually run hang_devsync on the computer node under GDB and check using info shared what libcuda was dlopen'd by libcudart, this path should match what was displayed by the tool in it's log).
In both cases I use the same library. My installation was on the login
nodes where I do not have GPUs. Do I need to use a compute node?

> 2) Dyninst instrumentation error. What platform (x86,PPC, etc) are you using this tool on?
x86. I use JUWELS [1].
> What version of Dyninst are you using?v10.1.0-41-g194dda7
> What version of GCC/Clang is being used for compilation of Dyninst?
GCC 8.3.0
(cmake/make logs in attach)

> 3) (unlikely given that you appear to be running on a cluster) as Nisarg mentioned, there is a timeout for cuda kernels that run longer than 5 second on machines that are using the Nvidia card as a display adapter. This is a problem for the test program which spin locks on a single kernel for a long time. You can test if this is an issue by directly launching hang_devsync and seeing if it exits (this program will never return if it is working correctly)."hang_devsync" exits immediately when I execute it. And our GPU experts
say that there is no such thing as a kernel runtime limit on JUWELS.
What else can go wrong here?

Thanks,
Ilya

[1]
https://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUWELS/Configuration/Configuration_node.html

On 11.05.20 16:33, Benjamin Welton wrote:
> Hello llya,
>
> As Nisarg mentioned, the likely issue here is that the test program that
> is launched to determine the location of the internal synchronization
> function (hang_devsync) did not complete (most likely it crashed).
>
> This test program is rewritten by the tool (using dyninst) and executed.
> Was there a core file that was created for a program called hang_devsync?/
>
> In any case there are three likely causes of this test program crashing:
> 1) injecting the wrong libcuda.so into the test program. This can occur
> if a parallel file system is in use and it contains a libcuda that
> differs from the driver version in use by a compute node (note: despite
> it's name, libcuda is not part of the CUDA toolkit, it is part of the
> GPU driver package itself). Check to make sure the libcuda the tool is
> detecting and injecting into the program matches the libcuda version
> applications run on the node actually use (simplest way to check this is
> to manually run hang_devsync on the computer node under GDB and check
> using info shared what libcuda was dlopen'd by libcudart, this path
> should match what was displayed by the tool in it's log).
>
> 2) Dyninst instrumentation error. What platform (x86,PPC, etc) are you
> using this tool on? What version of Dyninst are you using? What version
> of GCC/Clang is being used for compilation of Dyninst?
>
> 3) (unlikely given that you appear to be running on a cluster) as Nisarg
> mentioned, there is a timeout for cuda kernels that run longer than 5
> second on machines that are using the Nvidia card as a display adapter.
> This is a problem for the test program which spin locks on a single
> kernel for a long time. You can test if this is an issue by directly
> launching hang_devsync and seeing if it exits (this program will never
> return if it is working correctly).
>
> Ben
>
> On Mon, May 11, 2020, 12:21 AM NISARG SHAH <nisargs@xxxxxxxxxxx
> <mailto:nisargs@xxxxxxxxxxx>> wrote:
>
> Thanks Ilya!
>
> It looks like the instrumentation that figures out synchronization
> function in CUDA did not run completely to the end (it takes around
> 20-30 minutes to finish).
>
> Do you know if the segfault occurs immediately (within 4-5s) after
> the last line is printed to screen ("Inserting signal start instra
> in main")? If this is so, the cause of error might be CUDA's kernel
> runtime limit. You might need to increase or disable it altogether.
>
>
> Regards
> Nisarg
>
> ------------------------------------------------------------------------
> *From:* Ilya Zhukov
> *Sent:* Sunday, May 10, 2020 4:52 AM
> *To:* NISARG SHAH; dyninst-api@xxxxxxxxxxx
> <mailto:dyninst-api@xxxxxxxxxxx>
> *Subject:* Re: [DynInst_API:] mutateLibcuda segfaults
>
> Hi Nisarg,
>
> I do not have "MS_outputids.bin" directory but I have 5 *.dot files in
> the directory I ran the program.
>
> Cheers,
> Ilya
>
> On 09.05.20 00:15, NISARG SHAH wrote:
> > Hi Ilya,
> >
> > From the backtrace, it looks like the error is due to the program not
> > being able to read from a temporary file "MS_outputids.bin" that is
> > creates initially. Can you check if it exists in the directory from
> > where you ran the program? Also, can you check if 5 *.dot files are
> > present in the same directory?
> >
> > Thanks
> > Nisarg
> >
> > ------------------------------------------------------------------------
> > *From:* Dyninst-api <dyninst-api-bounces@xxxxxxxxxxx
> <mailto:dyninst-api-bounces@xxxxxxxxxxx>> on behalf of Ilya
> > Zhukov <i.zhukov@xxxxxxxxxxxxx <mailto:i.zhukov@xxxxxxxxxxxxx>>
> > *Sent:* Wednesday, May 6, 2020 7:16 AM
> > *To:* dyninst-api@xxxxxxxxxxx <mailto:dyninst-api@xxxxxxxxxxx>
> <dyninst-api@xxxxxxxxxxx <mailto:dyninst-api@xxxxxxxxxxx>>
> > *Subject:* [DynInst_API:] mutateLibcuda segfaults
> >
> > Dear dyinst developers,
> >
> > I'm testing your cuda_sync_analyze tool on our cluster for CUDA/10.1.105. <http://10.1.105.>
> >
> > I installed dyinst and cuda_sync_analyze (cmake and make logs in attach)
> > successfully. But I get segmentation fault when I create fake CUDA library.
> >
> > Here is a backtrace
> >> #0 0x00002b0a9658c4bc in fseek () from /usr/lib64/libc.so.6
> >> #1 0x00002b0a93b7eb29 in LaunchIdentifySync::PostProcessing (this=this@entry=0x7fff1af88af0, allFound=...) at /p/project/cslts/zhukov1/work/tools/dyninst/tools/cuda_sync_analyzer/src/LaunchIdentifySync.cpp:90
> >> #2 0x00002b0a93b7c00f in CSA_FindSyncAddress(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) () at /p/project/cslts/zhukov1/work/tools/dyninst/tools/cuda_sync_analyzer/src/FindCudaSync.cpp:34
> >> #3 0x00000000004021fb in main () at /p/project/cslts/zhukov1/work/tools/dyninst/tools/cuda_sync_analyzer/src/main.cpp:15
> >> #4 0x00002b0a96537505 in __libc_start_main () from /usr/lib64/libc.so.6
> >> #5 0x000000000040253e in _start () at /p/project/cslts/zhukov1/work/tools/dyninst/tools/cuda_sync_analyzer/src/main.cpp:38
> >
> > Any help will be appreciated. If you need anything else let me know.
> >
> > Best wishes,
> > Ilya
> > --
> > Ilya Zhukov
> > Juelich Supercomputing Centre
> > Institute for Advanced Simulation
> > Forschungszentrum Juelich GmbH
> > 52425 Juelich, Germany
> >
> > Phone: +49-2461-61-2054
> > Fax: +49-2461-61-2810
> > E-mail: i.zhukov@xxxxxxxxxxxxx <mailto:i.zhukov@xxxxxxxxxxxxx>
> > WWW: http://www.fz-juelich.de/jsc
>

[← Prev in Thread]	Current Thread	[Next in Thread→]
[DynInst_API:] mutateLibcuda segfaults, Ilya Zhukov Re: [DynInst_API:] mutateLibcuda segfaults, NISARG SHAH Re: [DynInst_API:] mutateLibcuda segfaults, Ilya Zhukov <Possible follow-up(s)> Re: [DynInst_API:] mutateLibcuda segfaults, NISARG SHAH Re: [DynInst_API:] mutateLibcuda segfaults, Benjamin Welton Re: [DynInst_API:] mutateLibcuda segfaults, Ilya Zhukov Re: [DynInst_API:] mutateLibcuda segfaults, Benjamin Welton <= Re: [DynInst_API:] mutateLibcuda segfaults, Ilya Zhukov Re: [DynInst_API:] mutateLibcuda segfaults, Ilya Zhukov

Previous by Date:	Re: [DynInst_API:] mutateLibcuda segfaults, Ilya Zhukov
Next by Date:	Re: [DynInst_API:] Abort while building Dyninst with Intel compiler - tbb related, thaines . astro
Previous by Thread:	Re: [DynInst_API:] mutateLibcuda segfaults, Ilya Zhukov
Next by Thread:	Re: [DynInst_API:] mutateLibcuda segfaults, Ilya Zhukov
Indexes:	[Date] [Thread]

Mailing List Archives

Authenticated access

Re: [DynInst_API:] mutateLibcuda segfaults