Debugging Distributed Programs Using Controlled Re-execution

Distributed programs are hard to write. A distributed debugger equipped with the mechanism to re-execute the traced computation in a controlled fashion can greatly facilitate the detection and localization of bugs. This approach gives rise to a general problem, called predicate control problem, which takes a computation and a safety property specified on the computation, and outputs a controlled computation that maintains the property. We define a class of global predicates, called region predicates, that can be controlled efficiently in a distributed computation. We prove that the synchronization generated by our algorithm is optimal. Further, we introduce the notion of an admissible sequence of events and prove that it is equivalent to the notion of predicate control. We then give an efficient algorithm for the class of disjunctive predicates based on the notion on an admissible sequence.