Re: [DynInst_API:] How to use DyninstAPI to get thread/process call stack in case of a signal delivery


Date: Tue, 22 Apr 2014 15:29:08 +0000
From: "Lee, Greg" <lee218@xxxxxxxx>
Subject: Re: [DynInst_API:] How to use DyninstAPI to get thread/process call stack in case of a signal delivery

Jie,

 

The DysectAPI is currently in prototype form, but is available in STAT 2.1.  Note that it is currently just a prototype, so it is not documented and is most likely very buggy!  When you configure STAT, supply the --enable-dysectapi flag.  After building STAT, there are example “sessions” in the “examples/sessions” directory.  From there, you can run:

 

% <your_stat_prefix>/bin/dysectc sess-06.cpp

% export STAT_GROUP_OPS=1

% <your_stat_prefix>/bin/stat-cl -X $PWD/libsess-06.so -L $HOME/logs -l FE -l BE -l CP -M -C srun -n 8 ../src/mpi_ringtopo2 20

 

The sess-06.cpp file (which is run in the commands above) should give you an idea of some of the general DysectAPI features.  The session below shows how to gather stack traces on a signal.  When the application is running, you can send a `kill -10 <PID>` to one of the MPI processes to trigger the stack trace sampling. 

 

% cat onsigusr.cpp

#include <LibDysectAPI.h>

 

DysectStatus DysectAPI::onProcStart() {

  Probe *p = new Probe(Async::signal(SIGUSR1),

                       Domain::world(500),

                       Act::stat());

  ProbeTree::addRoot(p);

  return DysectOK;

}

 

% <your_stat_prefix>/bin/dysectc onsigusr.cpp

 

% export STAT_GROUP_OPS=1

 

% <your_stat_prefix>/bin/stat-cl -X $PWD/libonsigusr.so -L $HOME/logs -l FE -l BE  -C srun -n 8 ../src/mpi_ringtopo2 30

STAT started at 2014-04-22-08:18:33

Launching application and tool daemons...

Tool daemons launched and connected!

Attaching to application...

Attached!

Resuming the application...

Resumed!

 

## Prototype DysectAPI enabled ##

Notice: Traditional sampling is disabled troughout session!

Setting up frontend session '/g/g0/lee218/src/STAT/examples/sessions/libonsigusr.so'...

<Apr 22 08:18:33> DysectAPI Frontend: Verbose > Break on enter key: yes

<Apr 22 08:18:33> DysectAPI Frontend: Verbose > Break on timeout: no

<Apr 22 08:18:33> DysectAPI Frontend: Info > DysectAPI setup took 4 ms

Dysect session setup complete

Application already running... ignoring request to resume

Waiting for events (! denotes captured event)

Hit <enter> to stop session

rzmerl14, MPI task 1 of 8 stalling for 30 of 30 seconds

rzmerl14, MPI task 1 of 8 stalling for 20 of 30 seconds

 

Sampling traces...

Traces sampled!

Merging traces...

Traces merged!

srun: error: rzmerl14: task 4: User defined signal 1

rzmerl14, MPI task 1 of 8 stalling for 10 of 30 seconds

rzmerl14, MPI task 1 of 8 proceeding

srun: First task exited 30s ago

srun: tasks 0-3,5-7: running

srun: task 4: exited abnormally

srun: Terminating job step 1947661.2

slurmd[rzmerl14]: *** STEP 1947661.2 KILLED AT 2014-04-22T08:19:24 WITH SIGNAL 9 ***

srun: Job step aborted: Waiting up to 2 seconds for job step to finish.

slurmd[rzmerl14]: *** STEP 1947661.2 KILLED AT 2014-04-22T08:19:24 WITH SIGNAL 9 ***

<Apr 22 08:19:25> DysectAPI Frontend: Info > Stopping session - application has exited

Detaching from application...

Detached!

 

Results written to /g/g0/lee218/src/STAT/examples/sessions/stat_results/mpi_ringtopo2.0439

 

% <your_stat_prefix>/bin/stat-view stat_results/mpi_ringtopo2.0439/*.dot

 

 

In this example, the “Domain::world(500)” argument to the probe means that the probe is applied to all processes (the world).  Only 1 process needs to receive the SIGUSR1 signal, but it will wait 500ms in case other processes get this signal too and then after the 500ms will gather the STAT stack trace.

 

Let me know how this works for you or if you have any questions.

 

                -Greg

 

From: JiangJie [mailto:yangtzj@xxxxxxxxxxx]
Sent: Tuesday, April 22, 2014 7:03 AM
To: Legendre, Matthew P.
Cc: dyninst-api@xxxxxxxxxxx; Lee, Greg
Subject: RE: [DynInst_API:] How to use DyninstAPI to get thread/process call stack in case of a signal delivery

 

 


> > I do in fact know STAT, including its internals, a fair bit. And this
> > should be a fairly obvious extension if you take a look at the STAT
> > back-end code, as everything it's doing is with ProcControlAPI and
> > Stackwalker directly. You'd simply want to register your signal handler
> > callback with ProcControl, and have that callback trigger a stack walk
> > with the back end's Walker.
>
> There's a new release of STAT 2.1 that includes the initial version of DysectAPI support. With DysectAPI you should be able to configure STAT to do a stackwalk on a signal.
>
>
> And if you still want to roll your own, Bill suggested the correct approach. More specifically:
>
> 1. Use StackwalkerAPI to attach to a process. You'll get a Walker object
> 2. From a Walker, you can use getProcessState() to get a ProcDebug object.
> 3. A ProcDebug has a getProc() call that gets you a ProcControlAPI Process handle.
> 4. With ProcControlAPI you can register a callback that can trigger on a signal. From that callback, you can use the Walker to take a Stackwalk.
>
> You'd also need to modify the STAT source to coordinate a global stack walk upon a signal. Right now STAT expects to take a stackwalk from every process when triggered. It doesn't support taking a stackwalk from just a small subset of process that fault.
>
> -Matt
>

Hi Bill and Matthew,

Thanks for your helpful replies.

Today I took a look at STAT-2.1 source code and the DysectAPI.
And I will try it later.

Is there any document about DysectAPI?
Can STAT-2.1 collect and merge the call stacks from all processes upon a signal,
in the same way STAT FE handle call stacks on demand?
( Assume all processes get the same signal.)

Regards,
Jie





[← Prev in Thread] Current Thread [Next in Thread→]