8
8
\section goto-programs-overview Overview
9
9
Goto programs are the intermediate representation of the CPROVER tool
10
10
chain. They are language independent and similar to many of the compiler
11
- intermediate languages. Section \ref goto-programs describes the
12
- ` goto_programt ` and ` goto_functionst ` data structures in detail. However
13
- it useful to understand some of the basic concepts. Each function is a
14
- list of instructions, each of which has a type (one of 18 kinds of
11
+ intermediate languages. Each function is a
12
+ list of instructions, each of which has a type (one of 19 kinds of
15
13
instruction), a code expression, a guard expression and potentially some
16
14
targets for the next instruction. They are not natively in static
17
15
single-assign (SSA) form. Transitions are nondeterministic (although in
18
16
practise the guards on the transitions normally cover form a disjoint
19
17
cover of all possibilities). Local variables have non-deterministic
20
- values if they are not initialised. Variables and data within the
21
- program is commonly one of three types (parameterised by width):
22
- ` unsignedbv_typet ` , ` signedbv_typet ` and ` floatbv_typet ` , see
23
- ` util/std_types.h ` for more information. Goto programs can be serialised
18
+ values if they are not initialised. Goto programs can be serialised
24
19
in a binary (wrapped in ELF headers) format or in XML (see the various
25
20
` _serialization ` files).
26
21
27
- The ` cbmc ` option ` – show-goto-programs` is often a good starting point
22
+ The ` cbmc ` option ` -- show-goto-programs` is often a good starting point
28
23
as it outputs goto-programs in a human readable form. However there are
29
24
a few things to be aware of. Functions have an internal name (for
30
25
example ` c::f00 ` ) and a ‘pretty name’ (for example ` f00 ` ) and which is
31
26
used depends on whether it is internal or being presented to the user.
32
- The ` main ` method is the ‘logical’ main which is not necessarily the
33
- main method from the code. In the output ` NONDET ` is use to represent a
34
- nondeterministic assignment to a variable. Likewise ` IF ` as a beautified
27
+ In the output ` NONDET ` is use to represent a nondeterministic assignment to a
28
+ variable. Likewise ` IF ` as a beautified
35
29
` GOTO ` instruction where the guard expression is used as the condition.
36
- ` RETURN ` instructions may be dropped if they precede an ` END_FUNCTION `
37
- instruction. The comment lines are generated from the ` locationt ` field
30
+ The comment lines are generated from the ` source_locationt ` field
38
31
of the ` instructiont ` structure.
39
32
40
- ` goto-programs/ ` is one of the few places in the CPROVER codebase that
41
- templates are used. The intention is to allow the general architecture
42
- of program and functions to be used for other formalisms. At the moment
43
- most of the templates have a single instantiation; for example
44
- ` goto_functionst ` and ` goto_function_templatet ` and ` goto_programt ` and
45
- ` goto_program_templatet ` .
46
-
47
33
\section data_structures Data Structures
48
34
49
- FIXME: This text is partially outdated.
50
-
51
- The common starting point for working with goto-programs is the
52
- ` read_goto_binary ` function which populates an object of
53
- ` goto_functionst ` type. This is defined in ` goto_functions.h ` and is an
54
- instantiation of the template ` goto_functions_templatet ` which is
55
- contained in ` goto_functions_template.h ` . They are wrappers around a map
56
- from strings to ` goto_programt ` ’s and iteration macros are provided.
57
- Note that ` goto_function_templatet ` (no ` s ` ) is defined in the same
58
- header as ` goto_functions_templatet ` and is gives the C type for the
59
- function and Boolean which indicates whether the body is available
60
- (before linking this might not always be true). Also note the slightly
61
- counter-intuitive naming; ` goto_functionst ` instances are the top level
35
+ A \ref goto_functionst object contains a set of GOTO programs. Note the
36
+ counter-intuitive naming: ` goto_functionst ` instances are the top level
62
37
structure representing the program and contain ` goto_programt ` instances
63
- which represent the individual functions. At the time of writing
64
- ` goto_functionst ` is the only instantiation of the template
65
- ` goto_functions_templatet ` but other could be produced if a different
66
- data-structures / kinds of models were needed for functions.
67
-
68
- ` goto_programt ` is also an instantiation of a template. In a similar
69
- fashion it is ` goto_program_templatet ` and allows the types of the guard
70
- and expression used in instructions to be parameterised. Again, this is
71
- currently the only use of the template. As such there are only really
72
- helper functions in ` goto_program.h ` and thus ` goto_program_template.h `
73
- is probably the key file that describes the representation of (C)
74
- functions in the goto-program format. It is reasonably stable and
75
- reasonably documented and thus is a good place to start looking at the
76
- code.
77
-
78
- An instance of ` goto_program_templatet ` is effectively a list of
79
- instructions (and inner template called ` instructiont ` ). It is important
38
+ which represent the individual functions.
39
+
40
+ An instance of \ref goto_programt is effectively a list of
41
+ instructions (a nested class called \ref goto_programt::instructiont).
42
+ It is important
80
43
to use the copy and insertion functions that are provided as iterators
81
44
are used to link instructions to their predecessors and targets and
82
- careless manipulation of the list could break these. Likewise there are
83
- helper macros for iterating over the instructions in an instance of
84
- ` goto_program_templatet ` and the use of these is good style and strongly
85
- encouraged.
86
-
87
- Individual instructions are instances of type ` instructiont ` . They
88
- represent one step in the function. Each has a type, an instance of
89
- ` goto_program_instruction_typet ` which denotes what kind of instruction
45
+ careless manipulation of the list could break these.
46
+
47
+ Individual instructions are instances of type \ref goto_programt::instructiont.
48
+ They represent one step in the function. Each has a type, an instance of
49
+ \ref goto_program_instruction_typet which denotes what kind of instruction
90
50
it is. They can be computational (such as ` ASSIGN ` or ` FUNCTION_CALL ` ),
91
51
logical (such as ` ASSUME ` and ` ASSERT ` ) or informational (such as
92
- ` LOCATION ` and ` DEAD ` ). At the time of writing there are 18 possible
52
+ ` LOCATION ` and ` DEAD ` ). At the time of writing there are 19 possible
93
53
values for ` goto_program_instruction_typet ` / kinds of instruction.
94
54
Instructions also have a guard field (the condition under which it is
95
55
executed) and a code field (what the instruction does). These may be
96
- empty depending on the kind of instruction. In the default
97
- instantiations these are of type ` exprt ` and ` codet ` respectively and
98
- thus covered by the previous discussion of ` irept ` and its descendents.
99
- The next instructions (remembering that transitions are guarded by
56
+ empty depending on the kind of instruction.
57
+ These are of type \ref exprt and \ref codet respectively.
58
+ The next instructions (remembering that transitions may be
100
59
non-deterministic) are given by the list ` targets ` (with the
101
60
corresponding list of labels ` labels ` ) and the corresponding set of
102
61
previous instructions is get by ` incoming_edges ` . Finally ` instructiont `
103
- have informational ` function ` and ` location ` fields that indicate where
104
- they are in the code.
62
+ has informational ` function ` and ` source_location ` fields that indicate where
63
+ they are in the source code.
105
64
106
65
\section goto-conversion Goto Conversion
107
66
@@ -124,8 +83,12 @@ digraph G {
124
83
}
125
84
\enddot
126
85
127
- At this stage, CBMC constructs a goto-program from a symbol table. It
128
- does not use the parse tree or the source file at all for this step. This
86
+ At this stage, CBMC constructs a goto-program from a symbol table.
87
+ Each symbol in the symbol table with function type (that is, the symbol's ` type `
88
+ field contains a \ref code_typet) will be converted to a corresponding GOTO
89
+ program.
90
+
91
+ It does not use the parse tree or the source file at all for this step. This
129
92
may seem surprising, because the symbols are stored in a hash table and
130
93
therefore have no intrinsic order; nevertheless, every \ref symbolt is
131
94
associated with a \ref source_locationt, allowing CBMC to figure out the
@@ -282,18 +245,6 @@ This is not the final form of the goto-functions, since the lists of
282
245
instructions will be 'normalized' in the next step (Instrumentation),
283
246
which removes some instructions and adds targets to others.
284
247
285
- Note that goto_programt and goto_functionst are each template
286
- instantiations; they are currently the * only* specialization of
287
- goto_program_templatet and goto_functions_templatet, respectively. This
288
- means that the generated Doxygen documentation can be somewhat obtuse
289
- about the actual types of things, and is unable to generate links to the
290
- correct classes. Note that the
291
- \ref goto_programt::instructiont::code "code" member of a
292
- goto_programt's instruction has type \ref codet (its type in the
293
- goto_program_templatet documentation is given as "codeT", as this is the
294
- name of the template's type parameter); similarly, the type of a guard
295
- of an instruction is \ref guardt.
296
-
297
248
---
298
249
\section instrumentation Instrumentation
299
250
@@ -324,8 +275,15 @@ previous stage:
324
275
are littered with \ref code_skipt "skip" statements. The
325
276
instrumentation stage removes the majority of these.
326
277
327
- * Function pointers are removed. They are turned into switch statements
328
- (but see the next point; switch statements are further transformed).
278
+ * Variable lifespan implied by \ref code_declt instructions and lexical scopes
279
+ described by \ref code_blockt nodes is replaced by ` DECL ` and corresponding
280
+ ` DEAD ` instructions. There are therefore no lexical scopes in GOTO programs
281
+ (not even local variable death on function exit is enforced).
282
+
283
+ * Expressions with side-effects are explicitly ordered so that there is one
284
+ effect per instruction (apart from function call instructions, which can
285
+ still have many). For example, ` y = f() + x++ ` will have become something like
286
+ ` tmp1 = f(); y = tmp1 + x; x = x + 1; `
329
287
330
288
* Compound blocks are eliminated. There are several subclasses of
331
289
\ref codet that count as 'compound blocks;' therefore, later stages in
@@ -362,13 +320,7 @@ previous stage:
362
320
* \ref code_returnt "return statements" are transformed into
363
321
(unconditional) GOTOs whose target is the \ref END_FUNCTION
364
322
instruction. Each goto_programt should have precisely one such
365
- instruction. Note the presence of \ref code_deadt, which has a
366
- \ref code_deadt::symbol() "symbol()" member. Deads mark symbols that
367
- have just gone out of scope; typically, a GOTO that jumps to an
368
- END_FUNCTION instruction is preceded by a series of deads. Deads also
369
- follow sequences of instructions that were part of the body of a
370
- block (loop, conditional etc.) if there were symbols declared in that
371
- block.
323
+ instruction.
372
324
373
325
This stage concludes the * analysis-independent* program transformations.
374
326
0 commit comments