In the trace I see a sequence that looks like:
[linux.C:167-G] - Stopped with signal 19
[generator.C:209-G] - Got event
[generator.C:144-G] - Setting generator state to decoding
[generator.C:144-G] - Setting generator state to statesync
[generator.C:144-G] - Setting generator state to queueing
[generator.C:144-G] - Setting generator state to none
[generator.C:144-G] - Setting generator state to process_blocked
I've never seen this before, and I'm not sure what happened. It
almost looks like ProcControlAPI got an event that it couldn't
understand. I wonder if this is the missing event from the new
thread. I'd suggest focusing on this and seeing if you can trace what
happened.
A few minutes after I wrote this I realized what the core problem is.
ProcControlAPI keep track of "dead threads" in the ProcPool, and use
this list to suppress events that trickle in from dead multi-threaded
processes (we'd sometimes see Linux feed us queued up debug events from
threads after a process's main thread dies). As Josh suggested, we're
likely seeing TID reuse and mis-identified the new thread as a lingering
event from a dead thread.
This makes a tremendous amount of sense.
I seem to recall we've discussed the dead thread tracking problem before
and not come up with any good way to distinguish a recycled TID from a
dead one that legitimately should be suppressed, but this test case
suggests one simple and obvious solution: discard non-thread-create
events from dead threads, and (obviously) remove threads from the dead
list when their TID becomes live again.
Any problems with this approach that you guys see?
-Matt
--
--bw
Bill Williams
Paradyn Project
bill@xxxxxxxxxxx
|