Re: [DynInst_API:] Hung process


Date: Thu, 19 Feb 2015 12:20:49 -0600
From: Bill Williams <bill@xxxxxxxxxxx>
Subject: Re: [DynInst_API:] Hung process
...well, the simple and obvious solution at least initially appears to work. Patch attached; it'll show up on mainline assuming nothing goes horribly wrong with further testing.

On 02/18/2015 03:41 PM, Bill Williams wrote:
In the trace I see a sequence that looks like:

[linux.C:167-G] - Stopped with signal 19
[generator.C:209-G] - Got event
[generator.C:144-G] - Setting generator state to decoding
[generator.C:144-G] - Setting generator state to statesync
[generator.C:144-G] - Setting generator state to queueing
[generator.C:144-G] - Setting generator state to none
[generator.C:144-G] - Setting generator state to process_blocked

I've never seen this before, and I'm not sure what happened.  It
almost looks like ProcControlAPI got an event that it couldn't
understand.  I wonder if this is the missing event from the new
thread.  I'd suggest focusing on this and seeing if you can trace what
happened.

A few minutes after I wrote this I realized what the core problem is.
ProcControlAPI keep track of "dead threads" in the ProcPool, and use
this list to suppress events that trickle in from dead multi-threaded
processes (we'd sometimes see Linux feed us queued up debug events from
threads after a process's main thread dies).  As Josh suggested, we're
likely seeing TID reuse and mis-identified the new thread as a lingering
event from a dead thread.

This makes a tremendous amount of sense.

I seem to recall we've discussed the dead thread tracking problem before
and not come up with any good way to distinguish a recycled TID from a
dead one that legitimately should be suppressed, but this test case
suggests one simple and obvious solution: discard non-thread-create
events from dead threads, and (obviously) remove threads from the dead
list when their TID becomes live again.

Any problems with this approach that you guys see?

-Matt




--
--bw

Bill Williams
Paradyn Project
bill@xxxxxxxxxxx
diff --git a/proccontrol/src/linux.C b/proccontrol/src/linux.C
index e376861..8653b2f 100644
--- a/proccontrol/src/linux.C
+++ b/proccontrol/src/linux.C
@@ -110,19 +110,19 @@ Generator *Generator::getDefaultGenerator()
 
 bool GeneratorLinux::initialize()
 {
-   int result;
+    int result;
    
-   sigset_t usr2_set;
-   sigemptyset(&usr2_set);
-   sigaddset(&usr2_set, SIGUSR2);
-   result = pthread_sigmask(SIG_UNBLOCK, &usr2_set, NULL);
-   if (result != 0) {
-      perr_printf("Unable to unblock SIGUSR2: %s\n", strerror(result));
-   }
+    sigset_t usr2_set;
+    sigemptyset(&usr2_set);
+    sigaddset(&usr2_set, SIGUSR2);
+    result = pthread_sigmask(SIG_UNBLOCK, &usr2_set, NULL);
+    if (result != 0) {
+	perr_printf("Unable to unblock SIGUSR2: %s\n", strerror(result));
+    }
    
-   generator_lwp = P_gettid();
-   generator_pid = P_getpid();
-   return true;
+    generator_lwp = P_gettid();
+    generator_pid = P_getpid();
+    return true;
 }
 
 bool GeneratorLinux::canFastHandle()
@@ -278,10 +278,6 @@ bool DecoderLinux::decode(ArchEvent *ae, std::vector<Event::ptr> &events)
       lproc = dynamic_cast<linux_process *>(proc);
    }
 
-   if (ProcPool()->deadThread(archevent->pid)) {
-      return true;
-   }
-
    if (!proc) {
       pthrd_printf("Warning: could not find event for process %d\n", archevent->pid);
    }
@@ -675,17 +671,23 @@ bool DecoderLinux::decode(ArchEvent *ae, std::vector<Event::ptr> &events)
       else 
          assert(0);
       event->setSyncType(Event::sync_thread);
+      ProcPool()->removeDeadThread(child->pid);
       delete parent;
       delete child;
    }
    else {
-      //Single event decoded
-      assert(event);
-      assert(!parent);
-      assert(!child);
-      assert(proc->proc());
-      assert(thread->thread());
-      delete archevent;
+       if (archevent && ProcPool()->deadThread(archevent->pid)) {
+	   delete archevent;
+	   return true;
+       }
+
+       //Single event decoded
+       assert(event);
+       assert(!parent);
+       assert(!child);
+       assert(proc->proc());
+       assert(thread->thread());
+       delete archevent;
    }
    event->setThread(thread->thread());
    event->setProcess(proc->proc());
diff --git a/proccontrol/src/procpool.C b/proccontrol/src/procpool.C
index 08b653b..dd60c3f 100644
--- a/proccontrol/src/procpool.C
+++ b/proccontrol/src/procpool.C
@@ -145,6 +145,11 @@ bool ProcessPool::deadThread(Dyninst::LWP lwp) {
 void ProcessPool::addDeadThread(Dyninst::LWP lwp) {
    deadThreads.insert(lwp);
 }
+void ProcessPool::removeDeadThread(Dyninst::LWP lwp) {
+    // Called when we get a LWP create, as that had *better*
+    // not be for an alread-dead thread.
+    deadThreads.erase(lwp);
+}
 
 unsigned ProcessPool::numProcs()
 {
diff --git a/proccontrol/src/procpool.h b/proccontrol/src/procpool.h
index c560252..0438a15 100644
--- a/proccontrol/src/procpool.h
+++ b/proccontrol/src/procpool.h
@@ -62,6 +62,7 @@ class ProcessPool
    // On Linux, we can get notifications for dead threads. Fun. 
    bool deadThread(Dyninst::LWP lwp);
    void addDeadThread(Dyninst::LWP lwp);
+   void removeDeadThread(Dyninst::LWP lwp);
    unsigned numProcs();
    bool LWPIDsAreUnique();
    bool for_each(ifunc f, void *data = NULL);
[← Prev in Thread] Current Thread [Next in Thread→]