Closed
Description
This thread is to collect needs for the GOTO language documentation effort we're starting in 2023.
The goal is to use this thread to scope the documentation and collect questions the documentation is supposed to answer.
- which fragment of GOTO shall we document ?
- what aspects of the language shall we document ?
- symbols and symbol tables,
- goto models,
- goto functions,
- goto programs,
- goto expressions abstract syntax,
- typchecking rules for instructions and expressions
- block structure and loops for goto programs,
- three level indexing (as needed to express trace semantics),
- stack vs heap model,
- built-in functions and predicates,
- supporting functions such as __CPROVER_start, __CPROVER_initialize,
- supporting state variables describing platform arch and memory safety check instrumentation,
- binary file format
- trace language syntax
- trace semantics (or rather, how to map a trace back to)
Metadata
Metadata
Type
Projects
Status
Done
Milestone
Relationships
Development
No branches or pull requests
Activity
NlightNFotis commentedon Jan 10, 2023
We already have a documentation PR we just raised as part of our internal efforts to document more of CBMC.
It's at #7470 and covers the GOTO IR data structures with some light documentation.
celinval commentedon Jan 10, 2023
Awesome! Is it possible to also document the goto file format?
martin-cs commentedon Jan 11, 2023
martin-cs commentedon Jan 11, 2023
@remi-delmas-3000 I really want to support this effort because it is so very needed. So please read this as "lessons learnt for the language documentation projects 2014, 2015, etc.":
doc/
goto-programt
and working down (exprt
) and up ('goto_modeltand
symbol_tablet`) as they are core to pretty much everything. See also RFC: Documenting and checking "normal form" properties of goto_models #6495.validate()
methods. That is exactly why they were introduced.#return_value
behaves inconsistently #4336 ), dead code and untested areas of the code base.remi-delmas-3000 commentedon Jan 11, 2023
@martin-cs thanks for jumping in !
There will be two parallel documentation efforts :
For 2. we envision a document in natural language.
Agreed. We need to understand what is there before attempting to describe it and I'm expecting this to be a bottom-up effort, starting from the implementation.
We mostly want to describe the abstract syntax of GOTO models (what's a goto model, a goto function, a goto program, a goto instruction, what operators are available in the expression language, and their semantics, i.e. small-step transition relation over concrete states). This could be similar to GOTO semantics by translation to FOL for instance.
Selecting a concrete syntax could come in a second step.
I thought that defining how things are modelled is mostly the role source-to-GOTO conversion. Do you know how often the meaning basic GOTO operators or instruction types are changed by changing their interpretation in Symex or when translating the SSA format to SAT or SMT ?
martin-cs commentedon Jan 11, 2023
celinval commentedon Jan 12, 2023
Thanks @martin-cs for jumping in. If not goto binaries, which language do you suggest that we use to interface with CBMC?
martin-cs commentedon Jan 12, 2023
@celinval I think goto binaries are a really good way of passing programs between tools. The
goto-cc
/goto-instrument
/ analysis tool (cbmc
,goto-analyzer
, etc.) split has shown to be a good architecture.What I had meant to communicate in my previous response was that I strongly suggest using the CPROVER code for creating / reading / writing these because it insulates you from a number of drift and updating issues.
In cases where this is not possible (for example, you are not writing a tool in C++) then it's a bit more awkward.
symtab2gb
and the JSON symtab was our best attempt at this. It isn't ideal but it has worked for a number of years. See #7042 (comment) and the following discussion for some more of the history / context for this.HTH
celinval commentedon Jan 12, 2023
Hi @martin-cs, our tool is written in Rust. We do currently use the JSON format and
symtab2gb
but it scales poorly. The JSON files are often above 1GB andsymtab2gb
takes a considerable amount to convert them togoto
.7 remaining items