Skip to content

Commit 1cd74a0

Browse files
committed
Update docs for goto-programs
1 parent dbd6988 commit 1cd74a0

File tree

1 file changed

+41
-89
lines changed

1 file changed

+41
-89
lines changed

src/goto-programs/README.md

+41-89
Original file line numberDiff line numberDiff line change
@@ -8,100 +8,59 @@
88
\section goto-programs-overview Overview
99
Goto programs are the intermediate representation of the CPROVER tool
1010
chain. They are language independent and similar to many of the compiler
11-
intermediate languages. Section \ref goto-programs describes the
12-
`goto_programt` and `goto_functionst` data structures in detail. However
13-
it useful to understand some of the basic concepts. Each function is a
14-
list of instructions, each of which has a type (one of 18 kinds of
11+
intermediate languages. Each function is a
12+
list of instructions, each of which has a type (one of 19 kinds of
1513
instruction), a code expression, a guard expression and potentially some
1614
targets for the next instruction. They are not natively in static
1715
single-assign (SSA) form. Transitions are nondeterministic (although in
1816
practise the guards on the transitions normally cover form a disjoint
1917
cover of all possibilities). Local variables have non-deterministic
20-
values if they are not initialised. Variables and data within the
21-
program is commonly one of three types (parameterised by width):
22-
`unsignedbv_typet`, `signedbv_typet` and `floatbv_typet`, see
23-
`util/std_types.h` for more information. Goto programs can be serialised
18+
values if they are not initialised. Goto programs can be serialised
2419
in a binary (wrapped in ELF headers) format or in XML (see the various
2520
`_serialization` files).
2621

27-
The `cbmc` option `show-goto-programs` is often a good starting point
22+
The `cbmc` option `--show-goto-programs` is often a good starting point
2823
as it outputs goto-programs in a human readable form. However there are
2924
a few things to be aware of. Functions have an internal name (for
3025
example `c::f00`) and a ‘pretty name’ (for example `f00`) and which is
3126
used depends on whether it is internal or being presented to the user.
32-
The `main` method is the ‘logical’ main which is not necessarily the
33-
main method from the code. In the output `NONDET` is use to represent a
34-
nondeterministic assignment to a variable. Likewise `IF` as a beautified
27+
In the output `NONDET` is use to represent a nondeterministic assignment to a
28+
variable. Likewise `IF` as a beautified
3529
`GOTO` instruction where the guard expression is used as the condition.
36-
`RETURN` instructions may be dropped if they precede an `END_FUNCTION`
37-
instruction. The comment lines are generated from the `locationt` field
30+
The comment lines are generated from the `source_locationt` field
3831
of the `instructiont` structure.
3932

40-
`goto-programs/` is one of the few places in the CPROVER codebase that
41-
templates are used. The intention is to allow the general architecture
42-
of program and functions to be used for other formalisms. At the moment
43-
most of the templates have a single instantiation; for example
44-
`goto_functionst` and `goto_function_templatet` and `goto_programt` and
45-
`goto_program_templatet`.
46-
4733
\section data_structures Data Structures
4834

49-
FIXME: This text is partially outdated.
50-
51-
The common starting point for working with goto-programs is the
52-
`read_goto_binary` function which populates an object of
53-
`goto_functionst` type. This is defined in `goto_functions.h` and is an
54-
instantiation of the template `goto_functions_templatet` which is
55-
contained in `goto_functions_template.h`. They are wrappers around a map
56-
from strings to `goto_programt`’s and iteration macros are provided.
57-
Note that `goto_function_templatet` (no `s`) is defined in the same
58-
header as `goto_functions_templatet` and is gives the C type for the
59-
function and Boolean which indicates whether the body is available
60-
(before linking this might not always be true). Also note the slightly
61-
counter-intuitive naming; `goto_functionst` instances are the top level
35+
A \ref goto_functionst object contains a set of GOTO programs. Note the
36+
counter-intuitive naming: `goto_functionst` instances are the top level
6237
structure representing the program and contain `goto_programt` instances
63-
which represent the individual functions. At the time of writing
64-
`goto_functionst` is the only instantiation of the template
65-
`goto_functions_templatet` but other could be produced if a different
66-
data-structures / kinds of models were needed for functions.
67-
68-
`goto_programt` is also an instantiation of a template. In a similar
69-
fashion it is `goto_program_templatet` and allows the types of the guard
70-
and expression used in instructions to be parameterised. Again, this is
71-
currently the only use of the template. As such there are only really
72-
helper functions in `goto_program.h` and thus `goto_program_template.h`
73-
is probably the key file that describes the representation of (C)
74-
functions in the goto-program format. It is reasonably stable and
75-
reasonably documented and thus is a good place to start looking at the
76-
code.
77-
78-
An instance of `goto_program_templatet` is effectively a list of
79-
instructions (and inner template called `instructiont`). It is important
38+
which represent the individual functions.
39+
40+
An instance of \ref goto_programt is effectively a list of
41+
instructions (a nested class called \ref goto_programt::instructiont).
42+
It is important
8043
to use the copy and insertion functions that are provided as iterators
8144
are used to link instructions to their predecessors and targets and
82-
careless manipulation of the list could break these. Likewise there are
83-
helper macros for iterating over the instructions in an instance of
84-
`goto_program_templatet` and the use of these is good style and strongly
85-
encouraged.
86-
87-
Individual instructions are instances of type `instructiont`. They
88-
represent one step in the function. Each has a type, an instance of
89-
`goto_program_instruction_typet` which denotes what kind of instruction
45+
careless manipulation of the list could break these.
46+
47+
Individual instructions are instances of type \ref goto_programt::instructiont.
48+
They represent one step in the function. Each has a type, an instance of
49+
\ref goto_program_instruction_typet which denotes what kind of instruction
9050
it is. They can be computational (such as `ASSIGN` or `FUNCTION_CALL`),
9151
logical (such as `ASSUME` and `ASSERT`) or informational (such as
92-
`LOCATION` and `DEAD`). At the time of writing there are 18 possible
52+
`LOCATION` and `DEAD`). At the time of writing there are 19 possible
9353
values for `goto_program_instruction_typet` / kinds of instruction.
9454
Instructions also have a guard field (the condition under which it is
9555
executed) and a code field (what the instruction does). These may be
96-
empty depending on the kind of instruction. In the default
97-
instantiations these are of type `exprt` and `codet` respectively and
98-
thus covered by the previous discussion of `irept` and its descendents.
99-
The next instructions (remembering that transitions are guarded by
56+
empty depending on the kind of instruction.
57+
These are of type \ref exprt and \ref codet respectively.
58+
The next instructions (remembering that transitions may be
10059
non-deterministic) are given by the list `targets` (with the
10160
corresponding list of labels `labels`) and the corresponding set of
10261
previous instructions is get by `incoming_edges`. Finally `instructiont`
103-
have informational `function` and `location` fields that indicate where
104-
they are in the code.
62+
has informational `function` and `source_location` fields that indicate where
63+
they are in the source code.
10564

10665
\section goto-conversion Goto Conversion
10766

@@ -124,8 +83,12 @@ digraph G {
12483
}
12584
\enddot
12685

127-
At this stage, CBMC constructs a goto-program from a symbol table. It
128-
does not use the parse tree or the source file at all for this step. This
86+
At this stage, CBMC constructs a goto-program from a symbol table.
87+
Each symbol in the symbol table with function type (that is, the symbol's `type`
88+
field contains a \ref code_typet) will be converted to a corresponding GOTO
89+
program.
90+
91+
It does not use the parse tree or the source file at all for this step. This
12992
may seem surprising, because the symbols are stored in a hash table and
13093
therefore have no intrinsic order; nevertheless, every \ref symbolt is
13194
associated with a \ref source_locationt, allowing CBMC to figure out the
@@ -282,18 +245,6 @@ This is not the final form of the goto-functions, since the lists of
282245
instructions will be 'normalized' in the next step (Instrumentation),
283246
which removes some instructions and adds targets to others.
284247

285-
Note that goto_programt and goto_functionst are each template
286-
instantiations; they are currently the *only* specialization of
287-
goto_program_templatet and goto_functions_templatet, respectively. This
288-
means that the generated Doxygen documentation can be somewhat obtuse
289-
about the actual types of things, and is unable to generate links to the
290-
correct classes. Note that the
291-
\ref goto_programt::instructiont::code "code" member of a
292-
goto_programt's instruction has type \ref codet (its type in the
293-
goto_program_templatet documentation is given as "codeT", as this is the
294-
name of the template's type parameter); similarly, the type of a guard
295-
of an instruction is \ref guardt.
296-
297248
---
298249
\section instrumentation Instrumentation
299250

@@ -324,8 +275,15 @@ previous stage:
324275
are littered with \ref code_skipt "skip" statements. The
325276
instrumentation stage removes the majority of these.
326277

327-
* Function pointers are removed. They are turned into switch statements
328-
(but see the next point; switch statements are further transformed).
278+
* Variable lifespan implied by \ref code_declt instructions and lexical scopes
279+
described by \ref code_blockt nodes is replaced by `DECL` and corresponding
280+
`DEAD` instructions. There are therefore no lexical scopes in GOTO programs
281+
(not even local variable death on function exit is enforced).
282+
283+
* Expressions with side-effects are explicitly ordered so that there is one
284+
effect per instruction (apart from function call instructions, which can
285+
still have many). For example, `y = f() + x++` will have become something like
286+
`tmp1 = f(); y = tmp1 + x; x = x + 1;`
329287

330288
* Compound blocks are eliminated. There are several subclasses of
331289
\ref codet that count as 'compound blocks;' therefore, later stages in
@@ -362,13 +320,7 @@ previous stage:
362320
* \ref code_returnt "return statements" are transformed into
363321
(unconditional) GOTOs whose target is the \ref END_FUNCTION
364322
instruction. Each goto_programt should have precisely one such
365-
instruction. Note the presence of \ref code_deadt, which has a
366-
\ref code_deadt::symbol() "symbol()" member. Deads mark symbols that
367-
have just gone out of scope; typically, a GOTO that jumps to an
368-
END_FUNCTION instruction is preceded by a series of deads. Deads also
369-
follow sequences of instructions that were part of the body of a
370-
block (loop, conditional etc.) if there were symbols declared in that
371-
block.
323+
instruction.
372324

373325
This stage concludes the *analysis-independent* program transformations.
374326

0 commit comments

Comments
 (0)