Re: [DynInst_API:] Hung process


Date: Wed, 18 Feb 2015 15:41:29 -0600
From: Bill Williams <bill@xxxxxxxxxxx>
Subject: Re: [DynInst_API:] Hung process
In the trace I see a sequence that looks like:

[linux.C:167-G] - Stopped with signal 19
[generator.C:209-G] - Got event
[generator.C:144-G] - Setting generator state to decoding
[generator.C:144-G] - Setting generator state to statesync
[generator.C:144-G] - Setting generator state to queueing
[generator.C:144-G] - Setting generator state to none
[generator.C:144-G] - Setting generator state to process_blocked

I've never seen this before, and I'm not sure what happened.  It
almost looks like ProcControlAPI got an event that it couldn't
understand.  I wonder if this is the missing event from the new
thread.  I'd suggest focusing on this and seeing if you can trace what
happened.

A few minutes after I wrote this I realized what the core problem is.
ProcControlAPI keep track of "dead threads" in the ProcPool, and use
this list to suppress events that trickle in from dead multi-threaded
processes (we'd sometimes see Linux feed us queued up debug events from
threads after a process's main thread dies).  As Josh suggested, we're
likely seeing TID reuse and mis-identified the new thread as a lingering
event from a dead thread.

This makes a tremendous amount of sense.

I seem to recall we've discussed the dead thread tracking problem before and not come up with any good way to distinguish a recycled TID from a dead one that legitimately should be suppressed, but this test case suggests one simple and obvious solution: discard non-thread-create events from dead threads, and (obviously) remove threads from the dead list when their TID becomes live again.

Any problems with this approach that you guys see?

-Matt


--
--bw

Bill Williams
Paradyn Project
bill@xxxxxxxxxxx
[← Prev in Thread] Current Thread [Next in Thread→]