Digital systems can fail in catastrophic ways leading to death or tremendous financial loss. Although their are many potential causes including physical failure, human error, and environmental factors, design errors are increasingly becoming the most serious culprit. The following recent events show the potential of design errors for disaster:
At Lockheed Martin in 1995, I found that the Lockheed Martin/Boeing F-22 will really be a computer with wings... If the software is dedtvered on time and on budget, and works as expected, the F-22 will be a success. If not, the F-22 will be in big trouble.
Thus software becomes the Achilles' heel of almost any aerospace defense project. To get it wrong means the whole project may go down the tubes.
On February 24, 1999, the PITAC issued its report, "Information Technology Research: Investing in Our Future," to President Clinton. They wrote:
Software research was judged by The President's Information Technology Advisory Committee to be the highest priority area for fundamental research. From the desktop computer to the phone system to the stock market, our economy and society have become increasingly reliant on software. This Committee concluded that not only is the demand for software exceeding our ability to produce it; the software that is produced today is fragile, unreliable, and difficult to design, test, maintain, and upgrade.
We are very good at building complex software systems that work 95% of the time. But we do not know how to build complex software systems that are ultra-reliably safe (i.e. P_f < 10-7/hour). Consequently, industries that build safety-critical systems have relied upon relatively simple system designs and minimized the amount of software that is safety-critical. They have also relied upon backups that are not software (e.g. a pilot that can take emergency action.) Nevertheless, even with this approach, the verification and certification of such systems has been enormously expensive. It is not unusual to hear that 60% or 70% of the cost of an avionics box is verification and certification cost.
The traditional way of verifying software systems is through human inspection, simulation, and testing. Unfortunately these approaches provide no guarantees about the quality of the software after it has been verified in this manner. Human inspection is limited by the abilities of the reviewers and simulation and testing can only explore a minuscule fraction of the state space of any real software system: