FM Past Program: Byzantine Agreement


This page is not maintained. It is probably out of date.

Fault-tolerant systems, although internally redundant, must deal with single-source information from the external world. For example, a flight control system is built around the notion of feedback from physical sensors such as accelerometers, position sensors, pressure sensors, etc. Although these can be replicated (and they usually are), the replicates do not produce identical results. In order to use bit-by-bit majority voting all of the computational replicates must operate on identical input data. Thus, the sensor values (the complete redundant suite) must be distributed to each processor in a manner which guarantees that all working processors receive exactly the same value even in the presence of some faulty processors. This is the classic Byzantine Generals problem.

CLI investigated the formal verification of such algorithms and their implementation. They formally verified the original Marshall, Shostak, and Lamport version of this algorithm using the Boyer-Moore theorem prover. They also implemented this algorithm down to the register-transfer level and demonstrated that it implements the mathematical algorithm. and then subsequently verified the design down to a hardware description language (HDL) developed at CLI.

CLI has reproduced the SRI verification of the interactive convergence algorithm using the Boyer-Moore theorem prover. CLI has also developed a formal model of asynchronous communication and demonstrated its utility by formally verifying a widely used protocol for asynchronous communication called the bi-phase mark protocol, also known as ``FM'' or ``single density''. It is one of several protocols implemented by microcontrollers such as the Intel 82530 and is used in the Intel 82C501AD Ethernet Serial Interface.

ORA has also investigated the formal verification of Byzantine Generals algorithms. They focused on the practical implementation of a Byzantine-resilient communications mechanism between Mini-Cayuga micro-processors. The Mini-Cayuga is a small but formally verified microprocessor developed by ORA. It is a research prototype and has not been fabricated. The communications circuitry would serve as a foundation for a fault-tolerant architecture. It was designed assuming that the underlying processors were synchronized (say by a clock synchronization circuit). The issues involved with connecting the Byzantine communications circuit with a clock synchronization circuit and verifying the combination has not yet been explored.

Thambidurai and Park introduced a fault model that classified faults into three categories: asymetric, symmetric and benign. They further suggested the need for and developed an algorithm that had capabilities beyond that of the earlier Byzantine generals algorithms. In particular, algorithms that can mask the effects of the less severe class of faults, in a more effective way. SRI has formally verified an improved version of their algorithm.


References


Curator and Responsible NASA Official: C. Michael Holloway
last modified: 2 August 2000 (10:13:03)