[os-reading] Practice talk for ASPLOS - Thursday 11:00am @4310


Date: Tue, 12 Mar 2013 18:17:19 -0500
From: Wei Zhang <wzh@xxxxxxxxxxx>
Subject: [os-reading] Practice talk for ASPLOS - Thursday 11:00am @4310
Hello all,

Wei Zhang will be giving a practice talk (on the topic of software failure recovery from concurrency bugs) for ASPLOS on March 14 (Thursday) from 11:00am to 12:00pm in room 4310. It would be great if you could come and give some feedbacks.

Thank you very much!
Wei

Abstract of the talk:
Many concurrency bugs are hidden in deployed software and cause
severe failures for end-users. When they finally manifest and become
known by developers, they are difficult to fix correctly. To support
end-users, we need techniques that help software survive hidden
concurrency bugs during production runs. To help developers, we
need techniques that fix exposed concurrency bugs.

The state-of-the-art techniques on concurrency-bug fixing and
survival only satisfy a subset of four important properties: compatibility, correctness, generality, and performance. We aim to develop a system that satisfies all of these four properties. To achieve this goal, we leverage two observations: (1) rolling back a single thread is suffi- cient to recover from most concurrency-bug failures; (2) reexecuting an idempotent region, which requires no memory-state checkpoint, is sufficient to recover from many concurrency-bug failures. Our system ConAir includes a static analysis component that automatically identifies potential failure sites, a static analysis component that automatically identifies the idempotent code regions around every failure site, and a code-transformation component that inserts rollback-recovery code around the identified idempotent regions.

We evaluated ConAir on 10 real-world concurrency bugs in
widely used C/C++ open-source applications. These bugs cover
different types of failure symptoms and root causes. Quantitatively,
ConAir helps software survive failures caused by all of these
bugs with negligible run-time overhead (<1%) and short recovery
time. Qualitatively, ConAir can help recover from failures caused
by unknown bugs. It guarantees that program semantics remain
unchanged and requires no change to operating systems or hardware.
[← Prev in Thread] Current Thread [Next in Thread→]
  • [os-reading] Practice talk for ASPLOS - Thursday 11:00am @4310, Wei Zhang <=