We propose a new algorithm for recovering asynchronously from failures in a
distributed computation. Our algorithm is based on two novel concepts -
a {\em fault-tolerant vector clock} to maintain causality information
in spite of failures, and a {\em history} mechanism to detect orphan states
and obsolete messages. These two mechanisms together with
checkpointing and message-logging are used
to restore the system to a consistent state after a failure of
one or more processes. Our algorithm
is completely asynchronous. 
It handles multiple failures and network partitioning,
does not assume any message  ordering, causes the minimum amount of 
rollback and restores the maximum recoverable state with low overhead.
Earlier optimistic protocols lack one or more of the above properties.