Abstract
We present a new approach that uses compiler-directed fault-injection for coverage testing of recovery code in Internet services, to evaluate their robustness to operating system and I/O hardware faults. We define a set of program-fault coverage metrics that enable quantification of Java catch blocks exercised during fault-injection experiments. We use compiler analyses to instrument application code in two ways: to direct fault injection to occur at appropriate points during execution, and to measure the resulting coverage. As a proof of concept for these ideas, we have applied our techniques manually to Muffin, a proxy server; we obtained a high degree of coverage of catch blocks, with on average 85% of the expected faults per catch being experienced as caught exceptions.