-
Notifications
You must be signed in to change notification settings - Fork 273
Proposal of Interface between Kani/CProver tools #7042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Tagging some people who might be interested in the above RFC: |
Hi, |
This might require changing various classes to plain structs, and will prohibit exposing anything that uses the STL. Maybe those are blockers for Rust bindings anyhow, in which case we will have to reconsider this. |
@NlightNFotis this is a really well articulated plan but I feel like I am missing some context which will make it hard to C your RF. What problem are you solving? Is this performance? Is this having less code to maintain? Is this having more of the tool chain written in Rust? Is this having a more sophisticated workflow? Some random, out of context, thoughts:
This sounds kinda scary. We have a very poor record of keeping patch-sets outside of develop alive.
This might be over-engineering. Transforms shouldn't be corrupting things. If they are you should throw it away and start again. See the massive discussion about "recovering from invariant failure" and the short answer of "don't; just start again.".
This is why I have been banging on about invariants and normal forms for goto-programs for years. It would be good to clarify what does and doesn't need to be done when creating a
Yes! Absolutely! This was the original aim of the CPROVER project; to take CBMC and to split it up into a library of components and functions. This is why the box in the middle of the architecture diagram says "goto-programs". This is why the on disk format and the hand-over between tools are goto-function binaries. This is why there are loads of
Minor point of slight pedantry. It is not a CFG. There is an interface to constructing those. It is a sequence of instructions. This is a different model and has some advantages and disadvantages. Hope that is a useful C for your RF. |
Another thought. If you want a cross-language API to create |
@NlightNFotis If one of your goals is to simplify the interaction between Kani and CPROVER, one option would be to write a C++ program that does (Current Integration between Kani and CBMC) steps 2 to 5 inclusive in one program. This should be plugging together various existing APIs and should not be particularly long. |
#6495 may also be relevant |
@NlightNFotis Would it be possible to give some (pseudocode) examples of what the initial API would look like, and what calls on it from an API consumer might look like? |
There seem to be questions on two time-scales:
|
Hi @danielsn, the questions you have asked merit a bit of deliberation from our end. We aim to get back to you some time next week with answers to those. |
Answering several points from @martin-cs comment (thanks for the detailed thoughts, Martin!)
I seem to recall seeing somewhere an If that's not the case, then we probably need to think about a way to do that. I seem to recall some differences between that
Sorry, I probably didn't word this as clearly as I could. Allow me to clarify: what I had in mind is a sort of stack structure, which maintains a set of binary differences over the goto-model, in-memory . That would allow to rollback transformations by effectively popping from that stack, and getting back at the goto-program as it existed before a certain set of transformations have happened.
I see. Does the clarification in the above paragraph change the calculus here? If rollback of transformations becomes significantly easier, (and assuming that transformations are atomic and obey certain invariants), wouldn't it make sense to rollback instead of trying again? Both in terms of performance, and of potential use cases unlocked by the continuous application/rollback of transformation passes.
I agree wholeheartedly. This is one of the first bits of work we're going to have to do once we get to the part of the plan where we implement the second step of the plan above.
Yeah, that's also a long term aim of the plan - to allow certain modules of the whole of CProver to be called on a plug & play fashion from various tools. We're still some way off of that however, and the first order of work that needs to be done before we can even get close to work on this is the outlining/tighter enforcement of invariants between various stages/passes of the various tools. |
Sorry, I kind of missed this comment, and just noticed this:
Yeah, these are what I remember. I think they were kind of outdated already in 2018/2019, but maybe some investigation could be done to see if we can bring them up to speed and use them as a base for the CBMC structures. There's also probably some thought that needs to be put into how that would complicate CBMC's building process, and where they would be located if we were to use them, but it can probably form a good base for us to build on. |
I missed this one, too:
My personal opinion is that it's not a scalable idea, because while it might work for Kani's case, if anyone needs another tool to do those or a subset of these steps, we may need to create another binary for that. I think long term might be a better idea if we can expose those steps as interfaces, and give leeway to external tools to mix and match calls to those as it suits those tools. |
The https://cxx.rs/ tool linked by Fotis appears to support a selection of STL types including |
One aspect of the interface which hasn't been mentioned yet is error handling. CBMC uses C++ exceptions for user errors, where we want to be able to report the incorrect usage of CBMC. My current understanding is that Rust does not have exceptions. So these would need to be converted to a CBMC also has invariants which currently end in a call to |
Kani currently has a Rust model of |
> Before starting gnat2goto we looked at a lot of options of how to
> combine a front-end written in !(C++) with the CPROVER code. That's
> what symtab2gb is. It was our best attempt at doing this. If you
> like I can try to dig out the notes from those meetings. We did
> consider the "wrap C++ in a foreign function interface" route but
> when you get into the details of C++ ABI stability and what that
> requires on the other language ... the horror was just too much.
I seem to recall seeing somewhere an Irep bindings generator, but
can't remember if that was part of gnat2goto or something else. IIRC,
and my memory doesn't falter, extending something like that to
provide the Irep structures and APIs as headers/implementation files
for various languages might be the easiest way to ensure that the API
for at least core structures remains in-sync between CBMC and other
tools utilising it.
It is certainly worth considering. It might even be worth
autogenerating std_expr.h anyway just to reduce the amount of
boilerplate code.
If that's not the case, then we probably need to think about a way to
do that. I seem to recall some differences between that IRep API and
the one within CBMC causing some difficulties, so there will
definitely need to be some standardisation around that as a way
forward.
There were a few issues but given that it pretty much works 6 years
later it is a viable approach.
Sorry, I probably didn't word this as clearly as I could. Allow me to
clarify: what I had in mind is a sort of stack structure, which
maintains a set of binary differences over the goto-model, in-memory
. That would allow to rollback transformations by effectively popping
from that stack, and getting back at the goto-program as it existed
before a certain set of transformations have happened.
Given that irept's are copy-on-write, we are not too far from this Just
Working. If you deep copied the goto-model before the transform then
this ~should~ JFW. Adding CoW to goto-model components could make
things a bit more efficient but the "iterators for program locations"
could bite you.
> This might be over-engineering. Transforms shouldn't be corrupting
> things. If they are you should throw it away and start again. See
> the massive discussion about "recovering from invariant failure"
> and the short answer of "don't; just start again.".
I see. Does the clarification in the above paragraph change the
calculus here? If rollback of transformations becomes significantly
easier, (and assuming that transformations are atomic and obey
certain invariants), wouldn't it make sense to rollback instead of
trying again? Both in terms of performance, and of potential use
cases unlocked by the continuous application/rollback of
transformation passes.
I don't think we have any transforms that are not deterministic. If
something fails then it is "start again and don't do that" and "submit
a bug report". I don't see rollback adding much there but ... if you
do then by all means implement it.
> This is why I have been banging on about invariants and normal
> forms for goto-programs for years. It would be good to clarify what
> does and doesn't need to be done when creating a goto-modelt.
I agree wholeheartedly. This is one of the first bits of work we're
going to have to do once we get to the part of the plan where we
implement the second step of the plan above.
Amazing! ❤️
> This is why when they are well maintained (I struggle to call goto-
> instrument this but it is a great example of the pattern), each of
> the tools is basically just handling command-line options, loading
> a goto-program and then calling a bunch of API functions.
Yeah, that's also a long term aim of the plan - to allow certain
modules of the whole of CProver to be called on a plug & play fashion
from various tools. We're still some way off of that however, and the
first order of work that needs to be done before we can even get
close to work on this is the outlining/tighter enforcement of
invariants between various stages/passes of the various tools.
Agreed. Worth doing regardless of inter-language API issues.
> @NlightNFotis<https://github.com/NlightNFotis> If one of your goals
> is to simplify the interaction between Kani and CPROVER, one option
> would be to write a C++ program that does (Current Integration
> between Kani and CBMC) steps 2 to 5 inclusive in one program. This
> should be plugging together various existing APIs and should not be
> particularly long.
My personal opinion is that it's not a scalable idea, because while
it might work for Kani's case, if anyone needs another tool to do
those or a subset of these steps, we may need to create another
binary for that.
I think long term might be a better idea if we can expose those steps
as interfaces, and give leeway to external tools to mix and match
calls to those as it suits those tools.
Would you let me try to persuade you otherwise?
A. We already maintain tools like this -- jbmc, jdiff, janalyzer, etc.
There has not been a huge demand for more of them.
B. If you / we improve the "componentiness" of CPROVER then maintaining
these should get even easier.
C. goto-models are a pretty stable interface -- it has to be given the
amount of code that assumes things about it. symtab2gb, despite pretty
light maintainable and pretty much no development, has worked as an
inter-language interface for 6 years. This is longer than many out-of-
tree tools in C++ have lasted.
D. For whatever reasons, we don't have a good track record with out-of-
tree tools. If they have a have a cross-language API and limitations
on what is exposed via the API I am not sure they will fare better.
I think that 'minimising the size of the cross language API to just
transferring the goto-modelt and having a custom tool in C++, either
in-tree or out-of-tree' is the least effort, most aligned with current
practice and most demonstrated to be maintainable.
|
> One aspect of the interface which hasn't been mentioned yet is
> error handling. CBMC uses C++ exceptions for user errors, where we
> want to be able to report the incorrect usage of CBMC. My current
> understanding is that Rust does not have exceptions. So these would
> need to be converted to a Result<T, E> on the Rust side of the
> interface.
CBMC also has invariants which currently end in a call to abort()
which terminates the entire process. This seems like a reasonable
solution where CBMC is being used as a CLI tool and it isn't worth
attempting to recover internally.
@thomasspriggs makes a good point. Often there is little that can be
done internally to recover; the code needs to be fixed.
However it seems a little unreasonable to terminate the entire
process of an application which is just using CBMC as a library.
Agreed. This has always been a bit of an issue with the "CPROVER is a
library of components" vision. It sort of works if it is "CPROVER is a
library of components for building your own verification tool" but not
really if it is "CPROVER is a library of components just like libz, you
can just add it into your program".
In this case it maybe preferable for the overall process to continue
without CBMC or to show the message/stack trace in a GUI rather than
immediately terminating.
If you are running it as an external binary and building with GCC then:
LINKFLAGS=-rdynamic
should get you the stack backtrace and extended diagnostics on failure.
Therefore it is worth considering how invariants should be handled
if CBMC is being used in this context.
Yes; building the "library" version with CPROVER_INVARIANT_DO_NOT_CHECK
or set cbmc_invariants_should_throw=true then you should avoid this but
... you might not be better off as there may not be any sane recovery.
Also failing to catch the invariant_failedt somewhere sensible has lead
to some impressively obscure bugs in the past.
|
Pseudocode API for parts of CBMCNote that this is a sketch of some of the key behaviours of the API planned to allow For example, various calls below have some kind of configuration that is required API Specification
API UsageVerificationfilename = "analysis/example.c"
model = load_model(from_file(filename))
// The from_cli_options is from the options (e.g. a string or atring[])
analysis_config = load_verification_config(from_cli_options(options))
results = perform_analysis(model, analysis_config)
output_results(handle, results, analysis_config) Everything starts from loading a The next bit of work to be done is to initialise the analysis configuration, which Next up is the actual performance of the analysis. The aim here is to provide a single
The last bit of work here is the actual printing of the Instrumentationfilename = "analysis/example.c"
model = load_model(from_file(filename))
// The from_cli_options is from the options (e.g. a string or atring[])
instrumentation_config = load_instrumentation_config(from_cli_options(options))
stack<goto_transformations> transformations
for (transformation in instrumentation_config.transformations) {
try {
model = apply_transformation(transformation, model)
transformations.push(transformation)
} catch {
report_error("Transformation couldn't be applied for reason: %s", result.error.message)
continue;
}
}
output_model(instrumentation_config, model) Instrumentation should work in a orthogonal way to verification - a model gets instantiated Then assuming an instrumentation configuration (in this particular case, we assume it's read Last but not least, we output the model through a call to Translation & Linking// The from_cli_options is from the options (e.g. a string or atring[])
translation_config = load_translation_config(from_cli_options(options))
list<goto_model> translated_files
// Compile
for (filename in translation_config.files) {
translated_files.add(compile(translation_config.options, from_file(filename)))
}
// Link into single goto program
goto_model linked = link_models(translation_config.options, translated_files)
// Fiel output for early phases, also for debugging
if (write_to_file) {
output_model(output_filename, linked)
} Translation and linking work similarly: there's a single entry point for compilation, After this translation has been completed successfully for all of the files the translation Notes on PhasesThere are several implementation choices above that start with doing things in the Incomplete PointsThere are some areas that are currently known to be incomplete (and intentionally
|
I'm planning on arranging a time when various stakeholders and contributors who are interested in this project can discuss (perhaps biweekly?). If you'd like to be part of this discussion please message me so I can include you in planning. |
Just a ping to those who are watching - we'd like to progress on the next steps ASAP. If you have comments/examples from Kani's current usage, or expectations/desires, please update here. |
@danielsn what is the status on providing feedback? |
I am reviewing our requirements with the Kani team and will have feedback beginning of next week |
@danielsn We're half way through the week now and I'm not aware of any updates here or on the linked Kani issue. We'd like to move on this work ASAP, should we wait on you or continue anyway? |
Thanks for the detailed proposal. We should have a larger discussion about next-steps/milestones, but to start this looks like a great first step towards the 1.0 goal of supporting with the API the same use-cases we have with command-line.
Otherwise, looks great! Thanks for the proposal. |
On the topic of error reporting/error handling: would it perhaps be easier not to expose exceptions at the level of the API proposed in here? Instead, each function should report success/failure? This likely wouldn't require much of a re-architecting, just catching exceptions at the layer that implements the top-level API and turning any exceptions into a "failure" result. |
I think I mentioned this previously, but it is preferable not to expose exceptions through the API because the user of the API may be working in a language which doesn't support exceptions at all. My understanding is that cross-language exceptions are generally problematic. That implies that we will need to catch the exceptions on the cbmc side of the API and expose the result status in some other non-exception form. |
I'm in agreement with Michael and Thomas. Best to have an API that returns a boolean success status than throw an exception. We might want to think about how to return information about the nature of the error - perhaps setting n error code (an enumeration?). File and line information about the location of source code fragments contributing to the error are important to get. |
Sounds like we're all in agreement about this: a failure enum sounds like a good solution here. If we're using Rust, there is an |
Hello, following up here on the first steps of the implementation of the Kani API. We had an internal meeting yesterday where we ratified the following plan:
There were some specific concerns that were raised in our last meeting, to which we have come up with some answers:
There is also a question that has been raised on our end that we would like an answer to:
|
WIP: #7274 |
@zhassan-aws did you have a chance to test the documentation of libcprover_rust? Should we close this issue? |
Yes, I reviewed the documentation. It looks good, but there are a few more APIs that are needed before we can fully rely on the crate for the Kani flow. More specifically, Kani makes the following calls to CBMC (Note that this has been updated recently, so is
Then, for each harness in the Rust code, it performs the following steps:
We can close this issue as long as we have other issues to track adding this functionality to the API. |
Thank you @zhassan-aws! @NlightNFotis, could you create issues to track these requests and add the |
Hi Felipe, I will be adding new tickets based on what @zhassan-aws has outlined above. In the meantime, there's also a ticket I had raise before, #7500 Are the steps there still valid, or should we close that in favour of the new tickets? |
@NlightNFotis let's close this one in favor of the new tickets. |
Hello,
We are looking to solicit some feedback from the Kani Team for the following proposal
on future integration between Kani and CBMC.
Do let us know about any inaccuracies in the description of the status quo, or any clarifications/enhancements to the proposals.
Thank you,
Current Integration between Kani and CBMC
Bulk of code of interest is under
kani/kani-driver
. The following steps are traced fromkani/kani-driver/src/main.rs
..rs
file into*.symtab.json
cargo_build
inkani/kani-driver/src/call_cargo.rs
*.symtab.json
files produced in step 1, and convert them (link all of the symbol table files) into*.symtab.out
by callingsymtab2gb
.symbol_table_to_gotoc
atkani/kani-driver/src/call_symtab.rs
GOTO-binary
GOTO-binary
.link_goto_binary
atkani/kani-driver/src/call_goto_cc.rs
.GOTO-binary
.goto-instrument
) run of filegoto-cc
run_goto_instrument
atkani/kani-driver/src/call_goto_instrument.rs
goto-model
validation, then perform series of instrumentations (drop unused functions, rewrite back edges, etc)cbmc
on binary (harness) and collects reportcheck_harness
inkani/kani-driver/src/main.rs
check_harness
callscbmc
through call torun_cbmc
ofkani/kani-driver/src/call_cbmc.rs
Considerations
https://cxx.rs
, so we can assume that they can call into the C++ functions.goto_convert
demands exposing the concepts of asymbol_tablet
,goto_modelt
, andmessage_handler
message_handler
, looks like an implementation detail that could be hidden.goto-binary
is effectively a serialised version of thegoto_modelt
kani/cprover_bindings/src/goto_program/symbol_table.rs
goto-cc
could be done programmatically, but at the moment too stateful (depends on state ofcompilet
which depends on initial configuration done bygoto-gcc
personality).goto-cc --function
might be able to be skipped if a call toansi-c-entry-point
can be made.configt
, but in theory nothing binds it to it - could be refactored?CBMC::do_it
is performing some initialisation that all leads to a selection of averifier
, which is amulti
orsingle-path-symex-checkert
, then initialised with some options and the goto-model, and instructed to run, and report on the run results.goto-program
produced throughget_goto_program
, which in turn callsprocess_goto_program
, which performs a number of transformations on thegoto-program
, for instance removing inline assembly or skips, performing a reachability slice, etc.goto-instrument
, but there are some problems:linkingt
class, likesymtab2gb
does, which operates on a symbol table level, or callinglink_goto_model
, which operates at the level of two goto-models.Proposals
Long-term Goal
goto-modelt
.cbmc
,goto-instrument
, etc., are going to be loading agoto-modelt
in memory.goto-modelt
.symbol_tablet
into agoto-modelt
(throughgoto_convert
is also going to be exposed), to enable current producers ofsymbol_tablet
(that now depend onsymtab2gb
) to be able to easily convert into agoto-modelt
for further manipulation/transformations.goto-modelt
(say, for instance,remove_skips
) are going to be maintained as a versioned patch-set over the initial model.config
andgoto-modelt
.goto-modelt
.goto-modelt
on a long term basis.goto-modelt
, both as parts of the CPROVER library, and as external tools.cbmc
,goto-instrument
, etc, could then be repurposed to be thin CLI wrappers around the library.goto-modelt
in the library, and extract a new one (along with analysis/verification reports, traces, etc.) from the new library.Step 1: API for invoking verification with CBMC
cbmc
(the binary) in a way that exposes a uniform verification interface.cbmc
's entry point is the functiondoit
, which for the most part does configuration management (by parsing command line arguments, etc) and loading/preprocessing of agoto-modelt
which it then passes on a verifier (selected based on command line arguments).goto-modelt
and aconfigt
passed to them, with no other dependencies.cbmc
in a way that enables programmatic access of the verification engine, instead of depending on the call to a binary version of it.Step 2: API extended with goto-model transformations
goto-modelt
required to be passed in to the verifiers.goto-modelt
are going to be enforced before any analysis.goto-modelt
(for example, the ones provided bygoto-instrument
) while always preserving the sequence of invariants demanded by these transformations, and ultimately, the verification engine.goto-functions
, others on the level ofgoto-modelt
, there are differences in the arities of these functions, etc).goto-instrument
in a way that enables performing the instrumentations that it provides in a programmatic way, without needing to depend on multiple calls to the binary.Step 3: API extended with compilation and linking
goto-modelt
, we can then move on adding construction ofgoto-modelt
s with compilation and linking supported by an API.goto-cc
so that it can be manipulated programmatically.Step 4: API to enable programmatic construction of
goto-modelt
goto-modelt
s, so the next step is to provide an API that allows staged construction of agoto-modelt
in a programmatic function.goto-modelt
is effectively a symbol table and a CFG (in the form ofgoto-functionst
) - the API would offer the capability to create a newgoto-function
, addgoto-instructions
to its body, and update the symbol table with the symbols it contains.The text was updated successfully, but these errors were encountered: