Skip to content

GOTO language documentation #7471

Closed
Closed
@remi-delmas-3000

Description

@remi-delmas-3000

This thread is to collect needs for the GOTO language documentation effort we're starting in 2023.

The goal is to use this thread to scope the documentation and collect questions the documentation is supposed to answer.

  • which fragment of GOTO shall we document ?
  • what aspects of the language shall we document ?
    • symbols and symbol tables,
    • goto models,
    • goto functions,
    • goto programs,
    • goto expressions abstract syntax,
    • typchecking rules for instructions and expressions
    • block structure and loops for goto programs,
    • three level indexing (as needed to express trace semantics),
    • stack vs heap model,
    • built-in functions and predicates,
    • supporting functions such as __CPROVER_start, __CPROVER_initialize,
    • supporting state variables describing platform arch and memory safety check instrumentation,
    • binary file format
    • trace language syntax
    • trace semantics (or rather, how to map a trace back to)

Activity

NlightNFotis

NlightNFotis commented on Jan 10, 2023

@NlightNFotis
Contributor

We already have a documentation PR we just raised as part of our internal efforts to document more of CBMC.

It's at #7470 and covers the GOTO IR data structures with some light documentation.

celinval

celinval commented on Jan 10, 2023

@celinval
Collaborator

Awesome! Is it possible to also document the goto file format?

martin-cs

martin-cs commented on Jan 11, 2023

@martin-cs
Collaborator
martin-cs

martin-cs commented on Jan 11, 2023

@martin-cs
Collaborator

@remi-delmas-3000 I really want to support this effort because it is so very needed. So please read this as "lessons learnt for the language documentation projects 2014, 2015, etc.":

  • Decide if you are writing documentation for developers / system integrators, beginning users or advanced users. See earlier discussions on the purpose of different bits of documentation.
  • There is already quite a bit of documentation. It is scattered. Bringing it together and rationalising is a worthwhile thing to do. Maintenance of documentation must be included in your plan to avoid this happening again.
  • Lifespan of up-to-date documentation, from best to worse:
  1. Checks run during normal program operation
  2. Checks run in debug mode
  3. Unit / regression tests run in CI
  4. Comments inline.
  5. Everything in doc/
  6. Anything out of the repository
added
awsBugs or features of importance to AWS CBMC users
on Jan 11, 2023
remi-delmas-3000

remi-delmas-3000 commented on Jan 11, 2023

@remi-delmas-3000
CollaboratorAuthor

@martin-cs thanks for jumping in !

There will be two parallel documentation efforts :

  1. (led by Diffblue) document GOTO as seen from inside CBMC for developers
  2. (led by AWS) document GOTO as abstract language for users who need to :
  • produce GOTO from their own source language (they need to understand the abstract syntax/semantics of GOTO),
  • would like to design/specify instrumentation rules for their programs,
  • would like to write a GOTO interpreter that would be capable of validating a counter example trace produced by a GOTO verification tool.

For 2. we envision a document in natural language.

I would recommend getting the documentation and, more importantly, the
testing and checking of the key data structures really good before you
attempt this.

Agreed. We need to understand what is there before attempting to describe it and I'm expecting this to be a bottom-up effort, starting from the implementation.

  1. is annoying. They do change from time to time and I don't think
    anyone is claiming they are so good that we should freeze them. There
    is a good-faith effort to change the version number of GOTO files when
    this happens but I don't think anyone tests it so...

We mostly want to describe the abstract syntax of GOTO models (what's a goto model, a goto function, a goto program, a goto instruction, what operators are available in the expression language, and their semantics, i.e. small-step transition relation over concrete states). This could be similar to GOTO semantics by translation to FOL for instance.

Selecting a concrete syntax could come in a second step.

  1. is more insideous and troubling and why this documentation is so needed but also hard. How constructs are expressed changes as and when things need to be fixed. They are still structurally the same but they use the ireps differently. This is why JSON Symtab based tools tend to break with an inexplicable error deep inside an entirely unrelated part of the code.

I thought that defining how things are modelled is mostly the role source-to-GOTO conversion. Do you know how often the meaning basic GOTO operators or instruction types are changed by changing their interpretation in Symex or when translating the SSA format to SAT or SMT ?

martin-cs

martin-cs commented on Jan 11, 2023

@martin-cs
Collaborator
celinval

celinval commented on Jan 12, 2023

@celinval
Collaborator

Thanks @martin-cs for jumping in. If not goto binaries, which language do you suggest that we use to interface with CBMC?

martin-cs

martin-cs commented on Jan 12, 2023

@martin-cs
Collaborator

@celinval I think goto binaries are a really good way of passing programs between tools. The goto-cc / goto-instrument / analysis tool (cbmc, goto-analyzer, etc.) split has shown to be a good architecture.

What I had meant to communicate in my previous response was that I strongly suggest using the CPROVER code for creating / reading / writing these because it insulates you from a number of drift and updating issues.

In cases where this is not possible (for example, you are not writing a tool in C++) then it's a bit more awkward. symtab2gb and the JSON symtab was our best attempt at this. It isn't ideal but it has worked for a number of years. See #7042 (comment) and the following discussion for some more of the history / context for this.

HTH

celinval

celinval commented on Jan 12, 2023

@celinval
Collaborator

Hi @martin-cs, our tool is written in Rust. We do currently use the JSON format and symtab2gb but it scales poorly. The JSON files are often above 1GB and symtab2gb takes a considerable amount to convert them to goto.

7 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Labels

awsBugs or features of importance to AWS CBMC usersdocumentation

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @NlightNFotis@remi-delmas-3000@martin-cs@esteffin@thomasspriggs

      Issue actions

        GOTO language documentation · Issue #7471 · diffblue/cbmc