Skip to content

Proposal of Interface between Kani/CProver tools #7042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NlightNFotis opened this issue Jul 28, 2022 · 36 comments
Closed

Proposal of Interface between Kani/CProver tools #7042

NlightNFotis opened this issue Jul 28, 2022 · 36 comments
Assignees
Labels
aws Bugs or features of importance to AWS CBMC users aws-high Kani Bugs or features of importance to Kani Rust Verifier RFC Request for comment

Comments

@NlightNFotis
Copy link
Contributor

NlightNFotis commented Jul 28, 2022

Hello,

We are looking to solicit some feedback from the Kani Team for the following proposal
on future integration between Kani and CBMC.

Do let us know about any inaccuracies in the description of the status quo, or any clarifications/enhancements to the proposals.

Thank you,


Current Integration between Kani and CBMC

Bulk of code of interest is under kani/kani-driver. The following steps are traced from kani/kani-driver/src/main.rs.

  1. Translate .rs file into *.symtab.json
    1. cargo_build in kani/kani-driver/src/call_cargo.rs
  2. Iterate through all the *.symtab.json files produced in step 1, and convert them (link all of the symbol table files) into *.symtab.out by calling symtab2gb.
    1. In symbol_table_to_gotoc at kani/kani-driver/src/call_symtab.rs
    2. The output at this level is (multiple?) GOTO-binary
  3. Link the binaries produced at step 2, into one GOTO-binary.
    1. In link_goto_binary at kani/kani-driver/src/call_goto_cc.rs.
    2. The output at this level is a single GOTO-binary.
  4. Perform instrumentation (with goto-instrument) run of file
    1. But after a function has been set as the entry point - designated harness - by goto-cc
    2. In run_goto_instrument at kani/kani-driver/src/call_goto_instrument.rs
    3. Perform goto-model validation, then perform series of instrumentations (drop unused functions, rewrite back edges, etc)
  5. Kani runs cbmc on binary (harness) and collects report
    1. At check_harness in kani/kani-driver/src/main.rs
    2. check_harness calls cbmc through call to run_cbmc of kani/kani-driver/src/call_cbmc.rs

Considerations

  1. The Kani Team probably want something like a Rust <-> C++ bridge, kind of like https://cxx.rs, so we can assume that they can call into the C++ functions.
    1. This works by building the C++ and Rust files, along with the bridge. Demo: https://github.com/dtolnay/cxx/tree/master/demo
    2. For a first draft API, we probably need to make these calls easier to enter into.
  2. goto_convert demands exposing the concepts of a symbol_tablet, goto_modelt, and message_handler
    1. Maybe not the message_handler, looks like an implementation detail that could be hidden.
    2. The goto-binary is effectively a serialised version of the goto_modelt
    3. Kani does have its own model of a CBMC symbol table at kani/cprover_bindings/src/goto_program/symbol_table.rs
  3. In theory, linking with goto-cc could be done programmatically, but at the moment too stateful (depends on state of compilet which depends on initial configuration done by goto-gcc personality).
  4. However call to goto-cc --function might be able to be skipped if a call to ansi-c-entry-point can be made.
    1. A bit hard, as for now it also depends on configt, but in theory nothing binds it to it - could be refactored?
  5. Effectively, CBMC::do_it is performing some initialisation that all leads to a selection of a verifier, which is a multi or single-path-symex-checkert, then initialised with some options and the goto-model, and instructed to run, and report on the run results.
    1. The initialisation depends on a goto-program produced through get_goto_program, which in turn calls process_goto_program, which performs a number of transformations on the goto-program, for instance removing inline assembly or skips, performing a reachability slice, etc.
    2. Most of these transformations are also performed by goto-instrument, but there are some problems:
      1. Non-uniform interface between these transformation passes.
      2. Non-delineated boundaries between these transformations/unknown invariants between these different transformation passes.
  6. There are two options for linking. Either calling the linkingt class, like symtab2gb does, which operates on a symbol table level, or calling link_goto_model, which operates at the level of two goto-models.

Proposals

Long-term Goal

  • Standardisation of most operations around a goto-modelt.
    • cbmc, goto-instrument, etc., are going to be loading a goto-modelt in memory.
    • All transformations, analysis, linking, etc. will be working on the goto-modelt.
      • Auxiliary functionality to translate a symbol_tablet into a goto-modelt (through goto_convert is also going to be exposed), to enable current producers of symbol_tablet (that now depend on symtab2gb) to be able to easily convert into a goto-modelt for further manipulation/transformations.
    • Subsequent transformations to the goto-modelt (say, for instance, remove_skips) are going to be maintained as a versioned patch-set over the initial model.
      • This would allow rollbacks of transformations to be both possible and efficient in the future.
      • This would also support scripting of the transformations and associated error handling without the need to start from scratch when a transformation has corrupted the goto-model.
  • All tools standardise their interface around a config and goto-modelt.
    • This allows external tools to be able to invoke them with with a provided configuration and goto-modelt.
  • This will allow CPROVER tools, to transition into being a library around goto-modelt on a long term basis.
    • This would allow users to write their own transformations/tools over goto-modelt, both as parts of the CPROVER library, and as external tools.
      • The existing tools cbmc, goto-instrument, etc, could then be repurposed to be thin CLI wrappers around the library.
    • External tools will be able to load goto-modelt in the library, and extract a new one (along with analysis/verification reports, traces, etc.) from the new library.
  • The clean-up of interfaces and internals of these tools will also offer the extra benefit of increased reliability/robustness for these tools when they are being used/invoked in binary form.

Step 1: API for invoking verification with CBMC

  • We will start moving towards the ideals expressed at the Long-term Goal above, by attempting to refactor cbmc (the binary) in a way that exposes a uniform verification interface.
    • Right now, cbmc's entry point is the function doit, which for the most part does configuration management (by parsing command line arguments, etc) and loading/preprocessing of a goto-modelt which it then passes on a verifier (selected based on command line arguments).
    • Aim is to expose these verifiers on a more uniform interface, and make sure that they can be invoked on a goto-modelt and a configt passed to them, with no other dependencies.
  • This allows us to deduplicate the main analysis engine from global state (which it now depends on, set up while the cbmc binary bootstraps) and make it easier to enter, both from within the CPROVER tools, and from external tools.
  • At the end of this step, we will have exposed cbmc in a way that enables programmatic access of the verification engine, instead of depending on the call to a binary version of it.

Step 2: API extended with goto-model transformations

  • As part of the first step, we will have characterised the schema of the goto-modelt required to be passed in to the verifiers.
    • This means that the minimum set of invariants required to analyse a goto-modelt are going to be enforced before any analysis.
  • This allows us to work backwards from that base, and start isolating other transformations on the goto-modelt (for example, the ones provided by goto-instrument) while always preserving the sequence of invariants demanded by these transformations, and ultimately, the verification engine.
    • We can also iron out the rough edges on their interfaces (make them uniform - right now some operate on the level of goto-functions, others on the level of goto-modelt, there are differences in the arities of these functions, etc).
  • At the end of this step, we will have similarly exposed goto-instrument in a way that enables performing the instrumentations that it provides in a programmatic way, without needing to depend on multiple calls to the binary.

Step 3: API extended with compilation and linking

  • After we can get all transformations to work and tools to be centred around a goto-modelt, we can then move on adding construction of goto-modelts with compilation and linking supported by an API.
  • At the end of this step, we will have exposed the core functionality provided by goto-cc so that it can be manipulated programmatically.

Step 4: API to enable programmatic construction of goto-modelt

  • At this step, we have all we need to handle compilation/linking/transformations of existing goto-modelts, so the next step is to provide an API that allows staged construction of a goto-modelt in a programmatic function.
    • The goto-modelt is effectively a symbol table and a CFG (in the form of goto-functionst) - the API would offer the capability to create a new goto-function, add goto-instructions to its body, and update the symbol table with the symbols it contains.
@NlightNFotis NlightNFotis added the RFC Request for comment label Jul 28, 2022
@NlightNFotis NlightNFotis self-assigned this Jul 28, 2022
@NlightNFotis
Copy link
Contributor Author

Tagging some people who might be interested in the above RFC:

@danielsn @tautschnig @feliperodri @jimgrundy

@tautschnig tautschnig added the Kani Bugs or features of importance to Kani Rust Verifier label Jul 28, 2022
@giltho
Copy link

giltho commented Jul 28, 2022

Hi,
Writing this comment as an outsider, but it could be worth writing a C API (i.e just a thin C layer on top of the current code), instead of interfacing C++ with Rust directly.
It would enable this same API to be used with a lot more different languages, and therefore possibly also by other tools?

@tautschnig
Copy link
Collaborator

Hi, Writing this comment as an outsider, but it could be worth writing a C API (i.e just a thin C layer on top of the current code), instead of interfacing C++ with Rust directly. It would enable this same API to be used with a lot more different languages, and therefore possibly also by other tools?

This might require changing various classes to plain structs, and will prohibit exposing anything that uses the STL. Maybe those are blockers for Rust bindings anyhow, in which case we will have to reconsider this.

@martin-cs
Copy link
Collaborator

martin-cs commented Jul 29, 2022

@NlightNFotis this is a really well articulated plan but I feel like I am missing some context which will make it hard to C your RF. What problem are you solving? Is this performance? Is this having less code to maintain? Is this having more of the tool chain written in Rust? Is this having a more sophisticated workflow?

Some random, out of context, thoughts:

  • Before starting gnat2goto we looked at a lot of options of how to combine a front-end written in !(C++) with the CPROVER code. That's what symtab2gb is. It was our best attempt at doing this. If you like I can try to dig out the notes from those meetings. We did consider the "wrap C++ in a foreign function interface" route but when you get into the details of C++ ABI stability and what that requires on the other language ... the horror was just too much. As @tautschnig wisely notes things like templated and the STL kinda ruin things. It /seemed/ like the best way of achieving that was as @giltho writing a shadow API in C, linking that to C++ and then writing an interface around that. But that requires writing and maintaining two interface layers and we figured it was easier to just implement one which did a very light serialisation and then write a simple de-serialiser.

  • Another consequence of this discussion was identifying goto-modelt as the pivotal interface. Our goal was to create these and hand them over to the C++. If you want to do a sequence of link and instrument and simplify and ... etc. then I would suggest writing that as a C++ tool and connecting at that level. Without knowing your context it is hard to say whether this is a good idea but it would reduce the API to just building goto-modelt's which is a fraction of the code. The goto-modelt is the narrow point in the architecture diagram.

  • ( Although since then goto-checker has been added and is generally a good abstraction of what the back-ends do. )

  • @NlightNFotis said:

Subsequent transformations to the goto-modelt (say, for instance, remove_skips) are going to be maintained as a versioned patch-set over the initial model.

This sounds kinda scary. We have a very poor record of keeping patch-sets outside of develop alive.

This would allow rollbacks of transformations to be both possible and efficient in the future.
This would also support scripting of the transformations and associated error handling without the need to start from scratch when a transformation has corrupted the goto-model.

This might be over-engineering. Transforms shouldn't be corrupting things. If they are you should throw it away and start again. See the massive discussion about "recovering from invariant failure" and the short answer of "don't; just start again.".

However call to goto-cc --function might be able to be skipped if a call to ansi-c-entry-point can be made.
A bit hard, as for now it also depends on configt, but in theory nothing binds it to it - could be refactored?
...
Non-delineated boundaries between these transformations/unknown invariants between these different transformation passes.

This is why I have been banging on about invariants and normal forms for goto-programs for years. It would be good to clarify what does and doesn't need to be done when creating a goto-modelt.

This will allow CPROVER tools, to transition into being a library around goto-modelt on a long term basis.
This would allow users to write their own transformations/tools over goto-modelt, both as parts of the CPROVER library, and as external tools.
The existing tools cbmc, goto-instrument, etc, could then be repurposed to be thin CLI wrappers around the library.
External tools will be able to load goto-modelt in the library, and extract a new one (along with analysis/verification reports, traces, etc.) from the new library.

Yes! Absolutely! This was the original aim of the CPROVER project; to take CBMC and to split it up into a library of components and functions. This is why the box in the middle of the architecture diagram says "goto-programs". This is why the on disk format and the hand-over between tools are goto-function binaries. This is why there are loads of .h files that just expose a one / a few C-style functions. This is why when they are well maintained (I struggle to call goto-instrument this but it is a great example of the pattern), each of the tools is basically just handling command-line options, loading a goto-program and then calling a bunch of API functions. This was a great idea then and remains a great idea now. (Having a REPL or Python interface for calling the APIs black-box style would also be very exciting and possible with this model.)

The goto-modelt is effectively a symbol table and a CFG (in the form of goto-functionst)

Minor point of slight pedantry. It is not a CFG. There is an interface to constructing those. It is a sequence of instructions. This is a different model and has some advantages and disadvantages.

Hope that is a useful C for your RF.

@martin-cs
Copy link
Collaborator

Another thought. If you want a cross-language API to create goto-modelts, either be serialisation or by wrapping, you need an exprt interface. This is probably a bigger job than wrapping / interfacing goto-modelt as it is very OO and so not in the "basically C" subset of C++ that is easy to interface to other languages. @smowton 's smart idea was to auto-generate it. I think his vision was to eventually auto-generate std_expr.h et al. along with bindings for other languages. There was a PR in CPROVER I think? Circa... 2016? 2017? The code he wrote definitely wound up in gnat2goto and is here:
https://github.com/diffblue/gnat2goto/blob/master/irep_utils/src/irep_specs_to_ada.py
https://github.com/diffblue/gnat2goto/tree/master/gnat2goto/ireps/irep_specs

@martin-cs
Copy link
Collaborator

@NlightNFotis If one of your goals is to simplify the interaction between Kani and CPROVER, one option would be to write a C++ program that does (Current Integration between Kani and CBMC) steps 2 to 5 inclusive in one program. This should be plugging together various existing APIs and should not be particularly long.

@martin-cs
Copy link
Collaborator

#6495 may also be relevant

@danielsn
Copy link
Contributor

danielsn commented Aug 2, 2022

@NlightNFotis Would it be possible to give some (pseudocode) examples of what the initial API would look like, and what calls on it from an API consumer might look like?

@danielsn
Copy link
Contributor

danielsn commented Aug 2, 2022

There seem to be questions on two time-scales:

  1. What is the right long-term architecture for CProver/CBMC. The re-architecture here seems to make a lot of sense, but also sounds like a lot of potential work. What is the expected time frame for delivering this?
  2. What is the best API to export in the short-term to allow easy scriptability of CBMC. As mentioned above, a concrete set of possible API pseudocode might help with the discussion.

@jimgrundy jimgrundy added aws Bugs or features of importance to AWS CBMC users aws-high labels Aug 10, 2022
@NlightNFotis
Copy link
Contributor Author

Hi @danielsn, the questions you have asked merit a bit of deliberation from our end. We aim to get back to you some time next week with answers to those.

@NlightNFotis
Copy link
Contributor Author

Answering several points from @martin-cs comment (thanks for the detailed thoughts, Martin!)

Before starting gnat2goto we looked at a lot of options of how to combine a front-end written in !(C++) with the CPROVER code. That's what symtab2gb is. It was our best attempt at doing this. If you like I can try to dig out the notes from those meetings. We did consider the "wrap C++ in a foreign function interface" route but when you get into the details of C++ ABI stability and what that requires on the other language ... the horror was just too much.

I seem to recall seeing somewhere an Irep bindings generator, but can't remember if that was part of gnat2goto or something else. IIRC, and my memory doesn't falter, extending something like that to provide the Irep structures and APIs as headers/implementation files for various languages might be the easiest way to ensure that the API for at least core structures remains in-sync between CBMC and other tools utilising it.

If that's not the case, then we probably need to think about a way to do that. I seem to recall some differences between that IRep API and the one within CBMC causing some difficulties, so there will definitely need to be some standardisation around that as a way forward.

This sounds kinda scary. We have a very poor record of keeping patch-sets outside of develop alive.

Sorry, I probably didn't word this as clearly as I could. Allow me to clarify: what I had in mind is a sort of stack structure, which maintains a set of binary differences over the goto-model, in-memory . That would allow to rollback transformations by effectively popping from that stack, and getting back at the goto-program as it existed before a certain set of transformations have happened.

This might be over-engineering. Transforms shouldn't be corrupting things. If they are you should throw it away and start again. See the massive discussion about "recovering from invariant failure" and the short answer of "don't; just start again.".

I see. Does the clarification in the above paragraph change the calculus here? If rollback of transformations becomes significantly easier, (and assuming that transformations are atomic and obey certain invariants), wouldn't it make sense to rollback instead of trying again? Both in terms of performance, and of potential use cases unlocked by the continuous application/rollback of transformation passes.

This is why I have been banging on about invariants and normal forms for goto-programs for years. It would be good to clarify what does and doesn't need to be done when creating a goto-modelt.

I agree wholeheartedly. This is one of the first bits of work we're going to have to do once we get to the part of the plan where we implement the second step of the plan above.

This is why when they are well maintained (I struggle to call goto-instrument this but it is a great example of the pattern), each of the tools is basically just handling command-line options, loading a goto-program and then calling a bunch of API functions.

Yeah, that's also a long term aim of the plan - to allow certain modules of the whole of CProver to be called on a plug & play fashion from various tools. We're still some way off of that however, and the first order of work that needs to be done before we can even get close to work on this is the outlining/tighter enforcement of invariants between various stages/passes of the various tools.

@NlightNFotis
Copy link
Contributor Author

Sorry, I kind of missed this comment, and just noticed this:

There was a PR in CPROVER I think? Circa... 2016? 2017? The code he wrote definitely wound up in gnat2goto and is here:
https://github.com/diffblue/gnat2goto/blob/master/irep_utils/src/irep_specs_to_ada.py
https://github.com/diffblue/gnat2goto/tree/master/gnat2goto/ireps/irep_specs

Yeah, these are what I remember. I think they were kind of outdated already in 2018/2019, but maybe some investigation could be done to see if we can bring them up to speed and use them as a base for the CBMC structures.

There's also probably some thought that needs to be put into how that would complicate CBMC's building process, and where they would be located if we were to use them, but it can probably form a good base for us to build on.

@NlightNFotis
Copy link
Contributor Author

I missed this one, too:

@NlightNFotis If one of your goals is to simplify the interaction between Kani and CPROVER, one option would be to write a C++ program that does (Current Integration between Kani and CBMC) steps 2 to 5 inclusive in one program. This should be plugging together various existing APIs and should not be particularly long.

My personal opinion is that it's not a scalable idea, because while it might work for Kani's case, if anyone needs another tool to do those or a subset of these steps, we may need to create another binary for that.

I think long term might be a better idea if we can expose those steps as interfaces, and give leeway to external tools to mix and match calls to those as it suits those tools.

@thomasspriggs
Copy link
Contributor

Hi, Writing this comment as an outsider, but it could be worth writing a C API (i.e just a thin C layer on top of the current code), instead of interfacing C++ with Rust directly. It would enable this same API to be used with a lot more different languages, and therefore possibly also by other tools?

This might require changing various classes to plain structs, and will prohibit exposing anything that uses the STL. Maybe those are blockers for Rust bindings anyhow, in which case we will have to reconsider this.

The https://cxx.rs/ tool linked by Fotis appears to support a selection of STL types including std::vector, std::unique_ptr and std::shared_ptr. This would indicate that we don't necessarily need to exclude usage of the STL, at least for Rust bindings.

@thomasspriggs
Copy link
Contributor

One aspect of the interface which hasn't been mentioned yet is error handling. CBMC uses C++ exceptions for user errors, where we want to be able to report the incorrect usage of CBMC. My current understanding is that Rust does not have exceptions. So these would need to be converted to a Result<T, E> on the Rust side of the interface.

CBMC also has invariants which currently end in a call to abort() which terminates the entire process. This seems like a reasonable solution where CBMC is being used as a CLI tool and it isn't worth attempting to recover internally. However it seems a little unreasonable to terminate the entire process of an application which is just using CBMC as a library. In this case it maybe preferable for the overall process to continue without CBMC or to show the message/stack trace in a GUI rather than immediately terminating. Therefore it is worth considering how invariants should be handled if CBMC is being used in this context.

@danielsn
Copy link
Contributor

Kani currently has a Rust model of goto here https://github.com/model-checking/kani/tree/main/cprover_bindings/src/goto_program

@martin-cs
Copy link
Collaborator

martin-cs commented Aug 11, 2022 via email

@martin-cs
Copy link
Collaborator

martin-cs commented Aug 11, 2022 via email

@TGWDB
Copy link
Contributor

TGWDB commented Aug 24, 2022

Pseudocode API for parts of CBMC

Note that this is a sketch of some of the key behaviours of the API planned to allow
CBMC to be used like a library, and in particular for the Kani project. The goal of
this document is to illustrate main pieces of functionality and nevision how they
would be used. It is important to keep in mind that the details here are largely left
out since focusing on them early (except to ensure information is not lost) is
premature.

For example, various calls below have some kind of configuration that is required
for the central behaviour to work correctly. This stores the general configuration
information that CBMC (and related tools) use during analysis. Here the exact
content and specification is omitted to focus on the calls themselves, however
this will need to be clarified (including how the configuration can be loaded,
set, checked for validity, etc.) before a proper API can be produced.

API Specification

  • load_model parses a GOTO binary to an in-memory goto_modelt structure from the
    file handle it has been passed. Note that this structure should not be accessed by the calling code directly, only via library calls.
  • Different modes (analysis, instrumentation, etc) have different configurations - this allows us to pass to each tool its own set of customised settings.
    • For the existing CLI tools, these are going to be parsed from the tool's command
      line options. (As a first step, an in memory/structured configuration will come in future.)
  • We intend to consolidate the various tools under a single point of entry for each operation:
    • perform_analysis is the single entry point for various analyses run. The selection
      for a specific analysis to be run, along any specific configuration (e.g. object_bit_size,
      etc) are being chosen based on the specific configuration passed to it.
      • The configuration can be instantiated by the tool that's running (say from the
        CLI arguments to that tool), but nothing stops it being instantiated in a different
        way (say, a tool that has a model of the configuration options and supplies the analysis
        engine with a custom set of options).
    • apply_transformation is the single entry point for various goto_program instrumentation
      passes. It accepts two arguments: the instrumentation to run and the existing model
      and returns a new model with the transformation run (actual call to this will most probably
      be using an out parameter for this instead of returning a goto_modelt for performance reasons).
    • compile is the single point of entry for compilation: you pass it in translation_config
      and a file, and what you get out of it is the goto_modelt (assuming compilation was successful).
    • link_models is the single entry point for linking: it too takes as input a structure that designates
      some translation_config (which contains instructions to the compiler and the linker) and a list of goto_modelts and the end result is a final goto_modelt that is the result of linking all the goto_modelts in the list together.

API Usage

Verification

filename = "analysis/example.c"
model = load_model(from_file(filename))
// The from_cli_options is from the options (e.g. a string or atring[])
analysis_config = load_verification_config(from_cli_options(options))

results = perform_analysis(model, analysis_config)
output_results(handle, results, analysis_config)

Everything starts from loading a goto_modelt in memory. (Again note that this model
will only be available as a handle, direct access and manipulation of the model is not
allowed.) load_model is the function that performs this bit of functionality, and
parses a goto_modelt structure from a binary version of a goto-program from a file
handle, the result of the call from_file(), assuming the filename passed to
from_file is valid and no errors occur during the open of the file.

The next bit of work to be done is to initialise the analysis configuration, which
in this case (assuming we're running cbmc as a CLI tool) is being parsed from the
command line arguments passed to the binary. This structure is designed to hold verification
options, (such as single/multi path symex, etc).

Next up is the actual performance of the analysis. The aim here is to provide a single
to verification, which is run on a supplied goto_modelt after it has been tuned by
the configuration provided to it.

results in this case is a results structure that is returned from performing the
analysis. This will hold the analysis results (a specialised data structure
for different analysis performed), properties, traces, errors, etc. With results
being its own structure, the user can then select what happens with the analysis
results (e.g., different models could be run, collecting their results, and then
their results could be diffed programmatically).

The last bit of work here is the actual printing of the results. That is done by the
output_results file which gets a file handle as the first parameter (can also output
to handles indicating STDOUT/STDERR, etc), the results structure along with the
configuration options (which allows for example to distinguish between different output
modes the user has requested, etc). Note that result printing is important in the
first phase of the project to allow interaction with legacy usage and for debugging.

Instrumentation

filename = "analysis/example.c"
model = load_model(from_file(filename))
// The from_cli_options is from the options (e.g. a string or atring[])
instrumentation_config = load_instrumentation_config(from_cli_options(options))

stack<goto_transformations> transformations

for (transformation in instrumentation_config.transformations) {
    try {
        model = apply_transformation(transformation, model)
        transformations.push(transformation)
    } catch {
        report_error("Transformation couldn't be applied for reason: %s", result.error.message)
        continue;
    }
}

output_model(instrumentation_config, model)

Instrumentation should work in a orthogonal way to verification - a model gets instantiated
into memory (be it by loading it from memory, or being passed in by another tool via IPC or
other interop).

Then assuming an instrumentation configuration (in this particular case, we assume it's read
from CLI arguments to a tool like goto-instrument), we iterate through the transformations
requested in the configuration, and apply those through apply_transformation, whose API
consists of the transformation to be applied and the existing model, and which then returns
a modified model, and adds the transformation that was just applied to a stack of transformation
passes applied.

Last but not least, we output the model through a call to output_model, which outputs the
instrumented goto_modelt, depending on the configuration that we have instantiated for the
instrumentation (show goto functions, etc). Note that outputting the model (e.g. to file)
is again important in early phases for legacy interactions and debugging - long term
maintaining everything in memory is the goal.

Translation & Linking

// The from_cli_options is from the options (e.g. a string or atring[])
translation_config = load_translation_config(from_cli_options(options))

list<goto_model> translated_files 

// Compile
for (filename in translation_config.files) {
    translated_files.add(compile(translation_config.options, from_file(filename)))
}

// Link into single goto program
goto_model linked = link_models(translation_config.options, translated_files)

// Fiel output for early phases, also for debugging
if (write_to_file) {
    output_model(output_filename, linked)
}

Translation and linking work similarly: there's a single entry point for compilation,
and that's compile, which takes some translation options and a file. After this
has been compiled into an in-memory goto_modelt, it gets added to a list of translated
files.

After this translation has been completed successfully for all of the files the translation
engine had to translate. All the goto models in translated_files are linked into
a single goto_modelt. This is done through a call to the single entry point for linking:
link_models which takes two arguments: the translation configuration (which will may
include directives to the compiler and the linker), and the list of goto_modelts to be linked.
The result is a new model that is the result of the linking of all the models together.

Notes on Phases

There are several implementation choices above that start with doing things in the
"old" way, e.g. loading config from CLI options, to make the initial implementation
faster and easier. However, long term many of these should be done directly in
memory. The goal of this initial sketch here is to identify the approximate API
and a path forward that can be used rapidly to ensure successful integration with
Kani in the early stages of the project.

Incomplete Points

There are some areas that are currently known to be incomplete (and intentionally
left as such here). This is done to converge on this with consultation and
pragmatic approaches to the implementations. These are noted here so that they
are not assumed to be forgotten. Discussion on these points is of course welcome
(this is an RFC!).

  • Error handling: The above API does not detail how error handling should be done.
    A good approach needs to be determined, specially considering some internal CBMC
    code currently terminates the program as a whole.
  • Session/lifetime: The current assumption is that the library will be set up
    and used for one session. If multiple different analysis runs on different programs
    are desired, then a new instance should be created. (That is, free/release the
    various components and start from scratch.)
  • Isolation: The current plans is to keep all the internals (including goto
    models) isolated and not allow any direct access. All manipulation and interaction
    should be done through library calls.
  • Program editing/manipulation: For the moment, editing of the goto program
    outside of an existing transformation is not available. This may be of
    interest, but for the current API this is assumed to be unavailable. This is
    motivated by the following (among other) factors:
    • Safety: Various passes, functions, and transformations will make assumptions
      about the model. Any editing/modification of the model outside of those known
      about and controlled may break the toolchain.
    • Model complexity/fragility: The model is currently represented be internal
      CBMC structures that would be very complex to expose. Exposing these would
      make the interactions very brittle, and make small changes on either side
      potentially breaking.
  • Exact API Function/Type Specifications: Thse have not been included here
    (yet) since we wish to converge on a workable API before trying to precisely
    define all the details.

@TGWDB
Copy link
Contributor

TGWDB commented Aug 24, 2022

I'm planning on arranging a time when various stakeholders and contributors who are interested in this project can discuss (perhaps biweekly?). If you'd like to be part of this discussion please message me so I can include you in planning.

@tautschnig tautschnig self-assigned this Sep 2, 2022
@TGWDB
Copy link
Contributor

TGWDB commented Sep 13, 2022

Just a ping to those who are watching - we'd like to progress on the next steps ASAP. If you have comments/examples from Kani's current usage, or expectations/desires, please update here.

@jimgrundy
Copy link
Collaborator

@danielsn what is the status on providing feedback?

@danielsn
Copy link
Contributor

I am reviewing our requirements with the Kani team and will have feedback beginning of next week

@TGWDB
Copy link
Contributor

TGWDB commented Sep 21, 2022

@danielsn We're half way through the week now and I'm not aware of any updates here or on the linked Kani issue. We'd like to move on this work ASAP, should we wait on you or continue anyway?

@danielsn
Copy link
Contributor

danielsn commented Sep 21, 2022

Thanks for the detailed proposal. We should have a larger discussion about next-steps/milestones, but to start this looks like a great first step towards the 1.0 goal of supporting with the API the same use-cases we have with command-line.
We have a few questions/concerns:

  1. We would like to have access to the same level and immediacy of information we get by running CBMC at present. In particular, the current API appears to be blocking. CProver steps often takes a long time, and it’s really valuable to our users to be able to see up to date information. For example, unwinding can sometimes diverge, and it’s useful to be able to see that right away. Perhaps a call-back / polling mechanism?
  2. We would like to have results in a structured format: e.g.
    1. for traces we’d like a vector of trace steps,
    2. for coverage a vector of covered and uncovered locations,
    3. for results a vector of structured properties and results
    4. For errors and warnings a vector of structured results
  3. We would like the ability to do basic queries on the goto-program. E.g.
    1. list properties
    2. list undefined functions
    3. list loops
  4. Memory management is going to be a significant question, and is a “one way door”. We need to make sure we’re very clear on who owns the memory behind each handle, and when they can be freed. I’d suggest putting every handle behind a reference counted smart pointer, which I believe the CXX crate supports.

Otherwise, looks great! Thanks for the proposal.

@tautschnig
Copy link
Collaborator

On the topic of error reporting/error handling: would it perhaps be easier not to expose exceptions at the level of the API proposed in here? Instead, each function should report success/failure? This likely wouldn't require much of a re-architecting, just catching exceptions at the layer that implements the top-level API and turning any exceptions into a "failure" result.

@thomasspriggs
Copy link
Contributor

On the topic of error reporting/error handling: would it perhaps be easier not to expose exceptions at the level of the API proposed in here? Instead, each function should report success/failure? This likely wouldn't require much of a re-architecting, just catching exceptions at the layer that implements the top-level API and turning any exceptions into a "failure" result.

I think I mentioned this previously, but it is preferable not to expose exceptions through the API because the user of the API may be working in a language which doesn't support exceptions at all. My understanding is that cross-language exceptions are generally problematic. That implies that we will need to catch the exceptions on the cbmc side of the API and expose the result status in some other non-exception form.

@jimgrundy
Copy link
Collaborator

I'm in agreement with Michael and Thomas. Best to have an API that returns a boolean success status than throw an exception. We might want to think about how to return information about the nature of the error - perhaps setting n error code (an enumeration?). File and line information about the location of source code fragments contributing to the error are important to get.

@danielsn
Copy link
Contributor

Sounds like we're all in agreement about this: a failure enum sounds like a good solution here. If we're using Rust, there is an Error type which it could be conveniently translated into.

@tautschnig tautschnig removed their assignment Oct 4, 2022
@NlightNFotis
Copy link
Contributor Author

Hello, following up here on the first steps of the implementation of the Kani API.

We had an internal meeting yesterday where we ratified the following plan:

  1. We will start with the first step of the plan as outlined in the original proposal (original issue text). To reiterate it here, we aim to provide an API for verification given some options and a goto-model.
  2. The implementation layer will (for now) be in C++, but the API we will expose to you will be in Rust. That is, we will own the C++ -> Rust wrapped API, as that gives us more flexibility and control over some of the unknowns in the process, and allows us to tailor a better API experience for you (by simplifying the interface for you, allowing us to test the Rust side of things as well, or to pivot to a different implementation mechanism if need be).
  3. The C++ -> Rust wrapper is going to be built with cxx.rs. We intend to take it as far as we can for now using that as the preferred implementation.

There were some specific concerns that were raised in our last meeting, to which we have come up with some answers:

  1. Memory ownership: Ownership must be on the library side (i.e. CBMC owns whatever is passed to it, and is responsible for deallocation).
  2. Nonblocking API: We can implement callbacks for progress monitoring, but we feel that the nonblocking issue is better addressed by a separate process/thread from the Kani side.
  3. Structured results: We aim to be giving back data structures directly instead of a textual format.
  4. Structured failures: There are some considerations we need to give a better think to before we get back to you on this specific issue.

There is also a question that has been raised on our end that we would like an answer to:

  1. What platforms does Kani run on? From the documentation it looks like it's Unix based. Is this a safe assumption for us going forward? Because tighter integration with other platforms may complicate the situation for us, depending on the tools on the Rust end and their maturity on windows.

@feliperodri
Copy link
Collaborator

WIP: #7274

@feliperodri
Copy link
Collaborator

@zhassan-aws did you have a chance to test the documentation of libcprover_rust? Should we close this issue?

@zhassan-aws
Copy link
Collaborator

Yes, I reviewed the documentation. It looks good, but there are a few more APIs that are needed before we can fully rely on the crate for the Kani flow. More specifically, Kani makes the following calls to CBMC (Note that this has been updated recently, so is
slightly different than the steps described in the OP):

  1. It calls goto-cc on the goto binary produced by codegen and links the Kani C library, e.g.
goto-cc a.out /home/ubuntu/git/kani/library/kani/kani_lib.c -o b.out

Then, for each harness in the Rust code, it performs the following steps:

  1. "Specializes" the goto binary for a specific harness by calling goto-cc with --function, e.g.
goto-cc b.out --function harness -o b_harness.out
  1. Calls goto-instrument with --add-library:
goto-instrument --add-library b_harness.out b_harness.out`
  1. Calls goto-instrument to replace empty functions and drop unused ones:
goto-instrument --generate-function-body-options assert-false-assume-false --generate-function-body ".*" --drop-unused-functions b_harness.out b_harness.out
  1. Calls goto-instrument to transform loops:
goto-instrument --ensure-one-backedge-per-target b_harness.out b_harness.out
  1. Calls cbmc:
cbmc --bounds-check --pointer-check --div-by-zero-check --float-overflow-check --nan-check --undefined-shift-check --unwinding-assertions --object-bits 16 --unwind 5 --slice-formula b_harness.out --json-ui

We can close this issue as long as we have other issues to track adding this functionality to the API.

@zhassan-aws zhassan-aws removed their assignment Apr 7, 2023
@feliperodri
Copy link
Collaborator

Thank you @zhassan-aws! @NlightNFotis, could you create issues to track these requests and add the kani and aws-high labels so we can track them as "maintenance"?

@NlightNFotis
Copy link
Contributor Author

Hi Felipe, I will be adding new tickets based on what @zhassan-aws has outlined above.

In the meantime, there's also a ticket I had raise before, #7500

Are the steps there still valid, or should we close that in favour of the new tickets?

@feliperodri
Copy link
Collaborator

@NlightNFotis let's close this one in favor of the new tickets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Bugs or features of importance to AWS CBMC users aws-high Kani Bugs or features of importance to Kani Rust Verifier RFC Request for comment
Projects
Status: Done
Development

No branches or pull requests

10 participants