Hello all,
Wei Zhang will be giving a practice talk (on the topic of software
failure recovery from concurrency bugs) for ASPLOS on March 14
(Thursday) from 11:00am to 12:00pm in room 4310. It would be great if
you could come and give some feedbacks.
Thank you very much!
Wei
Abstract of the talk:
Many concurrency bugs are hidden in deployed software and cause
severe failures for end-users. When they finally manifest and become
known by developers, they are difficult to fix correctly. To support
end-users, we need techniques that help software survive hidden
concurrency bugs during production runs. To help developers, we
need techniques that fix exposed concurrency bugs.
The state-of-the-art techniques on concurrency-bug fixing and
survival only satisfy a subset of four important properties:
compatibility, correctness, generality, and performance. We aim to
develop a system that satisfies all of these four properties. To
achieve this goal, we leverage two observations: (1) rolling back a
single thread is suffi- cient to recover from most concurrency-bug
failures; (2) reexecuting an idempotent region, which requires no
memory-state checkpoint, is sufficient to recover from many
concurrency-bug failures. Our system ConAir includes a static analysis
component that automatically identifies potential failure sites, a static
analysis component that automatically identifies the idempotent code
regions around every failure site, and a code-transformation component
that inserts rollback-recovery code around the identified idempotent
regions.
We evaluated ConAir on 10 real-world concurrency bugs in
widely used C/C++ open-source applications. These bugs cover
different types of failure symptoms and root causes. Quantitatively,
ConAir helps software survive failures caused by all of these
bugs with negligible run-time overhead (<1%) and short recovery
time. Qualitatively, ConAir can help recover from failures caused
by unknown bugs. It guarantees that program semantics remain
unchanged and requires no change to operating systems or hardware.
|