Date: | Mon, 11 May 2020 09:33:55 -0500 |
---|---|
From: | Benjamin Welton <welton@xxxxxxxxxxx> |
Subject: | Re: [DynInst_API:] mutateLibcuda segfaults |
Hello llya,
As Nisarg mentioned, the likely issue here is that the test program that is launched to determine the location of the internal synchronization function (hang_devsync) did not complete (most likely it crashed). This test program is rewritten by the tool (using dyninst) and executed. Was there a core file that was created for a program called hang_devsync?/ In any case there are three likely causes of this test program crashing: 1) injecting the wrong libcuda.so into the test program. This can occur if a parallel file system is in use and it contains a libcuda that differs from the driver version in use by a compute node (note: despite it's name, libcuda is not part of the CUDA toolkit, it is part of the GPU driver package itself). Check to make sure the libcuda the tool is detecting and injecting into the program matches the libcuda version applications run on the node actually use (simplest way to check this is to manually run hang_devsync on the computer node under GDB and check using info shared what libcuda was dlopen'd by libcudart, this path should match what was displayed by the tool in it's log). 2) Dyninst instrumentation error. What platform (x86,PPC, etc) are you using this tool on? What version of Dyninst are you using? What version of GCC/Clang is being used for compilation of Dyninst? 3) (unlikely given that you appear to be running on a cluster) as Nisarg mentioned, there is a timeout for cuda kernels that run longer than 5 second on machines that are using the Nvidia card as a display adapter. This is a problem for the test program which spin locks on a single kernel for a long time. You can test if this is an issue by directly launching hang_devsync and seeing if it exits (this program will never return if it is working correctly). Ben On Mon, May 11, 2020, 12:21 AM NISARG SHAH <nisargs@xxxxxxxxxxx> wrote:
|
[← Prev in Thread] | Current Thread | [Next in Thread→] |
---|---|---|
|
Previous by Date: | Re: [DynInst_API:] mutateLibcuda segfaults, NISARG SHAH |
---|---|
Next by Date: | Re: [DynInst_API:] mutateLibcuda segfaults, Ilya Zhukov |
Previous by Thread: | Re: [DynInst_API:] mutateLibcuda segfaults, NISARG SHAH |
Next by Thread: | Re: [DynInst_API:] mutateLibcuda segfaults, Ilya Zhukov |
Indexes: | [Date] [Thread] |