[RFC] SMT Plan

### Executive Summary

This RFC outlines an implementation plan for adding a standard and easily switchable SMT backend to CBMC. The goal of this work is to address two main areas. The first is to improve performance of CBMC by being able to both: use (runtime) switchable SMT solvers, and be extendable to exploit different theories. The second is for CBMC to have a high quality (clean, clear, robust, extensible, and maintainable) interaction with SMT solvers. This document includes the main details and measures for delivery of the project; the project plan; and key milestones and deliverables to be performed in a mostly iterative manner over vertical slices.

## Project Goals

The motivation for this project is to improve CBMC performance via SMT solvers. This has two main underlying aspects that both need to be addressed for this project to achieve long term success. The first is to allow CBMC to support (runtime) switchable SMT solvers and support multiple different theories. The second is to ensure that the SMT interaction with CBMC is high quality (clean, clear, robust, extensible and maintainable). These motivations drive the rest of the project detailed below.

The goal of this project is thus to allow CBMC to support multiple SMT solvers by using a common interface (SMT2LIB) and communication via a configurable interface (e.g. ports or pipes). The goals relating to the quality of the interaction should be taken into account and is somewhat reflected in the deliverables and overall project plan (incrementally developing a whole new SMT2 interaction rather than simply making small changes to the current SMT translation that is NOT clear, clean, robust, or maintainable).

The measures for delivery of the project are the following:

1. CBMC will have an interface based on SMT2LIB that allows communication with any SMT2LIB compliant SMT solver.
2. CBMC will be able to take the SMT solver as a configuration option and thus allow different SMT2LIB compliant solvers to be chosen at CBMC invocation time (not compile time).
3. CBMC is able to exploit different theories in the SMT solver to improve results. (Note that the current SMT interaction already supports more SMT theories than simply SAT, but this goal is to ensure improvement and parity.)

Note that these are not able to capture the detail of the quality, but quality should be kept in mind as it is one of the motivations of this project.

## Project Deliverables & Milestones

The plan for this project is to do vertical slices of the functionality with the goal that each step along the way can be used, even if this is an incomplete fragment. The rest of this section overviews both the iterative process for each vertical slice, as well as some details on what the slices are and how to progress through them. Note that while the main body of the project is an iteration with sub-deliverables, there are five others: two earlier ones (Foundation and Communication) and three later ones (Parsing SMT Output, Runtime Swapping Solvers and Final Steps). Only Foundation and Final Steps are strictly at the beginning or end, although Communication must occur before Runtime Swapping Solvers.


Below is a list of the SSA types and the SMT theories they will be translated into taken from knowledge of the current SMT translation in CBMC. These provide a useful starting point for a list of slices that can be implemented.

- Integer types: BitVec
- Floating point types: BitVec or FloatingPoint (with --fpa)
- Pointers: BitVec
- Structs: flattened per field into BitVec or data types (with --z3)
- Unions: BitVec
- Arrays: Array
- String literals: Array of BitVec

Note that additional slices may be considered, however the above provide at least the coverage of the existing SMT solver usage of CBMC.

The rest of this section details the plan and deliverables that can be used to iterate through the vertical slices, including an initial Foundation stage and two concluding stages.


### Foundation

This is a one-off part of the project that needs to be completed as a foundation for the rest of the SMT development. This involves adding a new flag to turn on the new functionality and ensuring that when this flag is enabled CBMC will attempt to use the new SMT backend.

Note that the invocation of SMT solvers through the new code path implemented here will be using incremental solving. This should be set as the default for how the interaction with all new SMT solvers will be conducted.

At the completion of the Foundation deliverable CBMC will be able to be invoked with the new flag. CBMC will behave normally when invoked with the new flag until the solver is invoked (or some other unimplemented feature is encountered), at which point the default unimplemented behaviour will occur, which is to report the error and terminate.

Once the Foundation is complete the (see below) Iteration deliverable (and sub-deliverables) will be iterated upon for each of the vertical slices. The following deliverable on Communication can be done at any point after the Foundation deliverable.

### Communication

The Communication deliverable will implement invoking the first chosen SMT solver (here Z3) and setting up connection between CBMC and SMT solver.

Part of the Communication deliverable is to also choose how the communication and invocation of the SMT solver will be done. To be completed this must also allow CBMC to receive responses from the SMT solver.

Note that this is distinct from the Foundation and the previous approach used by CBMC since one of the project goals is to have more uniformly swappable SMT solvers. The main purpose of the Communication deliverable is to ensure this is supported and designed with good quality.

This deliverable should first be implemented for Linux and macOS, with later iterations of this deliverable supporting the communication on Windows.

### Iteration (per Slice)

Note that for each slice the scope/target is already defined (see discussion at the start of the Project Deliverables & Milestones section).
The rest of this section overviews the parts of an iteration. Also in each sub-deliverable is some text (in _italics_) for an example of doing a slice for Integers.

#### Tests
This is the first step in each iteration and the deliverable for this is a collection of tests that demonstrate the capabilities and limitations (i.e. scope) of the slice. This includes all the structures and examples that should be covered.

Note that when creating the tests for a given slice this should also consider (or use) the existing regression tests that should be solvable when this slice is complete (including building on prior slices).

_For example, the Integer tests would include all the basic Integer operations. This would ensure coverage of both the translation of the operations on Integers, and also be able to validate that the SMT solver is understanding them correctly (i.e. no bugs in translation of configuration). Note that this would also include ensuring the correct handling of Integers when considering control flow (so the first slice must include basic control flow, and all following slices must ensure their handling of control flow). Similarly, casting to and from all previous (potentially castable types) and Integers are included in the Integer slice._

#### Internal Structure

This step is to implement a CBMC internal structure for the primitive chosen that can be mapped to the appropriate SMT2 representation. This should be based upon the existing irep structure, and provide a well defined structure for the primitive of this slice. For example, an internal irep-based structure to represent integers

This deliverable is complete when all the structure/forms required for this slice can be represented by the internal irep-based structure.

_For example, the internal structure for Integers must be able to represent all the base structures for Integers themselves, as well as all Integer unique structures. Further, all the existing structures may need to be extended to also account for the new Integer structures (e.g. if the plus or assign operator is already defined, it must now be extended to accept plus or assign of Integers)._

#### Internal Structure Output (Pretty Printing)

This step is to convert the internal structure into a suitable format for the SMT2 solver (e.g. a string). This should ideally also be usable for output in debugging.

This is complete when all of the internal structure as defined in the previous step can be output in an SMT2LIB format suitable for the SMT solver.

_For example, the output would not be able to output all the internal structures added in the previous sub-deliverable. This would also include understanding how to output in the correct format for the SMT2LIB representation of Integers._

#### SMT2LIB Translation

This step is to do the translation from CBMC’s SSA representation into the new structure that maps to the SMT2LIB format.

This deliverable is complete when all the identified/appropriate CBMC internal SSA expressions are able to be translated into the SMT2LIB internal irep-based structure.

Note that this does not require the previous sub-deliverable (Internal Structure Output) to be complete, however this may be hard to debug and evaluate without the Internal Structure Output.

_For example, this should not be able to handle all the tests generated before or extended with Integers and be able to translate them into the internal structure (including with the new Integer internal structures). This will include both writing new Integer specific translation functions, but also extending prior functions that before may have reported errors when passed Integer values (e.g. assign may not have been defined for Integers prior to this slice). Similarly, the functions created/extended here must also be written to handle (by reporting an error) incorrect usage that should be fixed in future slices (e.g. assignment to an Integer from an Array -- assuming the array slice has not been done prior to the Integer slice)._

#### Validation & Testing

The final deliverable is validation that all the above fit together and that the expected tests are passed. This is an explicit deliverable since it is expected there will be bugs that are found in final validation and testing that need to be repaired. This may also involve the creation of further tests that catch new/interesting cases.

_For example, this would involve ensuring all the tests developed for this slice and all previous slices that passed before still pass. Bugs or errors detected along the way would need to be fixed. Further, testing against the general regression tests should be done to ensure Integers are generally handled there and to evaluate the status and stability of the slice after Integers are added._


**This ends the iteration part. Note that this is designed so that the project can be paused at any point and can also progress to the following three steps, although the Final Step may be undesirable if SMT2 coverage is limited.**


### Parsing SMT Output

This deliverable is to enable parsing of the SMT solver output back into a useful internal form for cbmc. This involves replacing/rewriting the existing (hand crafted parser) with a parse-generator version (e.g. flex/bison).

This deliverable is complete when the output from SMT can be robustly handled to produce the traces necessary for cbmc traces. This should also include significant testing of the parser, i.e. unit tests and end-to-end tests.

### Runtime Swapping Solvers

This deliverable is to extend the Communication step to be able to support multiple SMT solvers using a simple flag or option. Ideally the above will turn out to be suitable SMT2 solver agnostic that any SMT2LIB conformant solver can simply be added here with a simple change to how CBMC invokes (and/or maybe interacts with) to allow runtime invocation and swapping of SMT solvers.

This deliverable will be complete when CBMC can take a runtime argument to run with the chosen SMT2-compliant solver. Note that this deliverable also involves implementing, testing, and completing the communication on all support operating systems.

This may involve some small amount of work to provide preamble/configuration details that CBMC can invoke to ensure the runtime choice is as seamless as possible.

Note, this should include CI for validation that the multiple solvers are supported. This should include at least: Z3 and one other solver (to be decided, e.g. CVC4, Boolector, Mathsat5, Yices2).

### Final Step

The final step is to remove/deprecate the old SMT solver(s) and use the new SMT interaction and translation developed here. This should only be considered when suitable coverage is achieved with the new SMT2 translation and communication.

Note that this should also involve all the CI and other details related to changing the default behaviour of CBMC to use the new solver.

Note: this should be sure to reinstate the cprover_init removal done in the Foundation if it has not been (re)enabled before.

### (Bonus) Dynamic SMT Solver Configuration

This deliverable is to enable more dynamic and subtle control of the SMT solver(s) via some kind of configurations either inside CBMC or through a configuration option that CBMC can pass through. The goal is to be able to pass through knowledge or information to the (specific?) SMT solver that can improve the performance.

## Progress Summary

This is a very brief summary of the progress so far with ~strikethrough~ indicating this is complete and merged:

- ~Foundation~
- ~Communication~
- Iteration (per slice)
- - ~Integer types: BitVec~
- - Floating point types: BitVec or FloatingPoint (with --fpa)
- - ~Pointers: BitVec~
- - Structs: flattened per field into BitVec or data types (with --z3)
- - Unions: BitVec
- - ~Arrays: Array~
- - String literals: Array of BitVec (untested)
- ~Parsing SMT output (per slice above)~
- ~Runtime Swappable Solvers~
- Final Step
- - Deprecate old SMT backend
- - ~CI for new solvers (Z3, CVC5)~
- ~(Bonus) Dynamic SMT Solver Configuration~ by invocation command as cbmc argument

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] SMT Plan #6134

Executive Summary

Project Goals

Project Deliverables & Milestones

Foundation

Communication

Iteration (per Slice)

Tests

Internal Structure

Internal Structure Output (Pretty Printing)

SMT2LIB Translation

Validation & Testing

Parsing SMT Output

Runtime Swapping Solvers

Final Step

(Bonus) Dynamic SMT Solver Configuration

Progress Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] SMT Plan #6134

Description

Executive Summary

Project Goals

Project Deliverables & Milestones

Foundation

Communication

Iteration (per Slice)

Tests

Internal Structure

Internal Structure Output (Pretty Printing)

SMT2LIB Translation

Validation & Testing

Parsing SMT Output

Runtime Swapping Solvers

Final Step

(Bonus) Dynamic SMT Solver Configuration

Progress Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions