Re: [DynInst_API:] Hung process


Date: Fri, 20 Feb 2015 10:50:02 -0600
From: Barton Miller <bart@xxxxxxxxxxx>
Subject: Re: [DynInst_API:] Hung process
Good job everyone!

--bart



On Feb 20, 2015, at 10:30 AM, Gerard <ggarcia@xxxxxxxxxxxx> wrote:

I have tested the patch with my original application and does indeed work so it seems this was really the problem. Thank you very much. I'll do further testing next week.

Gerard


2015-02-19 19:20 GMT+01:00 Bill Williams <bill@xxxxxxxxxxx>:
...well, the simple and obvious solution at least initially appears to work. Patch attached; it'll show up on mainline assuming nothing goes horribly wrong with further testing.


On 02/18/2015 03:41 PM, Bill Williams wrote:
In the trace I see a sequence that looks like:

[linux.C:167-G] - Stopped with signal 19
[generator.C:209-G] - Got event
[generator.C:144-G] - Setting generator state to decoding
[generator.C:144-G] - Setting generator state to statesync
[generator.C:144-G] - Setting generator state to queueing
[generator.C:144-G] - Setting generator state to none
[generator.C:144-G] - Setting generator state to process_blocked

I've never seen this before, and I'm not sure what happened.  It
almost looks like ProcControlAPI got an event that it couldn't
understand.  I wonder if this is the missing event from the new
thread.  I'd suggest focusing on this and seeing if you can trace what
happened.

A few minutes after I wrote this I realized what the core problem is.
ProcControlAPI keep track of "dead threads" in the ProcPool, and use
this list to suppress events that trickle in from dead multi-threaded
processes (we'd sometimes see Linux feed us queued up debug events from
threads after a process's main thread dies).  As Josh suggested, we're
likely seeing TID reuse and mis-identified the new thread as a lingering
event from a dead thread.

This makes a tremendous amount of sense.

I seem to recall we've discussed the dead thread tracking problem before
and not come up with any good way to distinguish a recycled TID from a
dead one that legitimately should be suppressed, but this test case
suggests one simple and obvious solution: discard non-thread-create
events from dead threads, and (obviously) remove threads from the dead
list when their TID becomes live again.

Any problems with this approach that you guys see?

-Matt




--
--bw

Bill Williams
Paradyn Project
bill@xxxxxxxxxxx

_______________________________________________
Dyninst-api mailing list
Dyninst-api@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api

_______________________________________________
Dyninst-api mailing list
Dyninst-api@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api
[← Prev in Thread] Current Thread [Next in Thread→]