The runtime-monitoring aspect of survivable software is based on Software Monitoring with Controllable Overhead (SMCO). Given a target overhead O, SMCO maximizes the confidence the user has in the monitoring while keeping overhead due to monitoring at levels that never exceed O. Key research challenges include extending the SMCO framework to the domain of embedded control software, and using aspect-oriented programming to specify and implement runtime monitors.
The fault containment and recovery aspect of survivable software is based on Hierarchical Simplified Redundancy (HSR), a new software architecture for embedded applications. Applying HSR to an embedded system means that, for each critical software module, there is a hierarchy of backup modules ready to resume execution of a failed module. Backup modules are designed for maximum verifiability and survivability, and should suffice only to ensure continued operation of the overall system. Stability analysis techniques ensure that a backup module sees a consistent system state when it takes over.
This effort has the potential to usher in a new era of safety- and mission-critical software, namely survivable software. Case studies involving flight software and other related embedded software are being conducted, thereby informing the research in a manner directly applicable to the mission of the USAF.