Skip to content

Commit d86ede0

Browse files
committed
[docs 4/5] Add CBMC hacking tutorial
A practical tutorial on getting started with CProver development is added and linked to from the front page. The tutorial contains an overview of the codebase and a few preliminary programming exercises, intended to give would-be CProver contributors an introduction to the key data structures used throughout the codebase.
1 parent 3a58226 commit d86ede0

File tree

2 files changed

+249
-0
lines changed

2 files changed

+249
-0
lines changed

doc/architectural/front-page.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,4 +32,10 @@ hosted on GitHub.
3232
contributors to CBMC. It describes the stages through which CBMC
3333
transforms source files into bug reports and counterexamples, linking
3434
to the relevant documentation for each stage.
35+
36+
* The \subpage cbmc-hacking "CBMC hacking HOWTO" helps new contributors
37+
to CProver to get their feet wet through a series of programming
38+
exercises---mostly modifying goto-instrument, and thus learning to
39+
manipulate the main data structures used within CBMC.
40+
3541
\defgroup module_hidden _hidden

doc/architectural/howto.md

Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,243 @@
1+
\ingroup module_hidden
2+
\page cbmc-hacking CBMC Hacking HOWTO
3+
4+
\author Kareem Khazem
5+
6+
This is an introduction to hacking on the `cprover` codebase. It is not
7+
intended as a user guide to `CBMC` or related tools. It is structured
8+
as a series of programming exercises that aim to acclimatise the reader
9+
to the basic data structures and workflow needed for contributing to
10+
`CBMC`.
11+
12+
13+
## Initial setup
14+
15+
Clone the [CBMC repository][cbmc-repo] and build it:
16+
17+
git clone https://github.com/diffblue/cbmc.git
18+
cd cbmc/src
19+
make minisat2-download
20+
make
21+
22+
Ensure that [graphviz][graphviz] is installed on your system (in
23+
particular, you should be able to run a program called `dot`). Install
24+
[Doxygen][doxygen] and generate doxygen documentation:
25+
26+
# In the src directory
27+
doxygen doxyfile
28+
# View the documentation in a web browser
29+
firefox doxy/html/index.html
30+
31+
If you've never used doxygen documentation before, get familiar with the
32+
layout. Open the generated HTML page in a web browser; search for the
33+
class `goto_programt` in the search bar, and jump to the documentation
34+
for that class; and read through the copious documentation.
35+
36+
The build writes executable programs into several of the source
37+
directories. In this tutorial, we'll be using binaries inside the
38+
`cbmc`, `goto-instrument`, and `goto-cc` directories. Add these
39+
directories to your `$PATH`:
40+
41+
# Assuming you cloned CBMC into ~/code
42+
export PATH=~/code/cbmc/src/goto-instrument:~/code/cbmc/src/goto-cc:~/code/cbmc/src/cbmc:$PATH
43+
# Add to your shell's startup configuration file so that you don't have to run that command every time.
44+
echo 'export PATH=~/code/cbmc/src/goto-instrument:~/code/cbmc/src/goto-cc:~/code/cbmc/src/cbmc:$PATH' >> .bashrc
45+
46+
Optional: install an image viewer that can read images on stdin.
47+
I use [feh][feh].
48+
49+
[cbmc-repo]: https://github.com/diffblue/cbmc/
50+
[doxygen]: http://www.stack.nl/~dimitri/doxygen/
51+
[graphviz]: http://www.graphviz.org/
52+
[feh]: https://feh.finalrewind.org/
53+
54+
55+
56+
## Whirlwind tour of the tools
57+
58+
CBMC's code is located under the `cbmc` directory. Even if you plan to
59+
contribute only to CBMC, it is important to be familiar with several
60+
other of cprover's auxiliary tools.
61+
62+
63+
### Compiling with `goto-cc`
64+
65+
There should be an executable file called `goto-cc` in the `goto-cc`
66+
directory; make a symbolic link to it called `goto-gcc`:
67+
68+
cd cbmc/src/goto-cc
69+
ln -s "$(pwd)/goto-cc" goto-gcc
70+
71+
Find or write a moderately-interesting C program; we'll call it `main.c`.
72+
Run the following commands:
73+
74+
goto-gcc -o main.goto main.c
75+
cc -o main.exe main.c
76+
77+
Invoke `./main.goto` and `./main.exe` and observe that they run identically.
78+
The version that was compiled with `goto-gcc` is larger, though:
79+
80+
du -hs *.{goto,exe}
81+
82+
Programs compiled with `goto-gcc` are mostly identical to their `clang`-
83+
or `gcc`-compiled counterparts, but contain additional object code in
84+
cprover's intermediate representation. The intermediate representation
85+
is (informally) called a *goto-program*.
86+
87+
88+
### Viewing goto-programs
89+
90+
`goto-instrument` is a Swiss army knife for viewing goto-programs and
91+
performing single program analyses on them. Run the following command:
92+
93+
goto-instrument --show-goto-functions main.goto
94+
95+
Many of the instructions in the goto-program intermediate representation
96+
are similar to their C counterparts. `if` and `goto` statements replace
97+
structured programming constructs.
98+
99+
Find or write a small C program (2 or 3 functions, each containing a few
100+
varied statements). Compile it using `goto-gcc` as above into an object
101+
file called `main`. If you installed `feh`, try the following command
102+
to dump a control-flow graph:
103+
104+
goto-instrument --dot main | tail -n +2 | dot -Tpng | feh -
105+
106+
If you didn't install `feh`, you can write the diagram to the file and
107+
then view it:
108+
109+
goto-instrument --dot main | tail -n +2 | dot -Tpng > main.png
110+
Now open main.png with an image viewer
111+
112+
(the invocation of `tail` is used to filter out the first line of
113+
`goto-instrument` output. If `goto-instrument` writes more or less
114+
debug output by the time you read this, read the output of
115+
`goto-instrument --dot main` and change the invocation of `tail`
116+
accordingly.)
117+
118+
There are a few other views of goto-programs. Run `goto-instrument -h`
119+
and try the various switches under the "Diagnosis" section.
120+
121+
122+
123+
## Learning about goto-programs
124+
125+
In this section, you will learn about the basic goto-program data
126+
structures. Reading from and manipulating these data structures form
127+
the core of writing an analysis for CBMC.
128+
129+
130+
### First steps with `goto-instrument`
131+
132+
<div class=memdoc>
133+
**Task:** Write a simple C program with a few functions, each containing
134+
a few statements. Compile the program with `goto-gcc` into a binary
135+
called `main`.
136+
</div>
137+
138+
139+
The entry point of `goto-instrument` is in `goto_instrument_main.cpp`.
140+
Follow the control flow into `goto_instrument_parse_optionst::doit()`, located in `goto_instrument_parse_options.cpp`.
141+
At some point in that function, there will be a long sequence of `if` statements.
142+
143+
<div class=memdoc>
144+
**Task:** Add a `--greet` switch to `goto-instrument`, taking an optional
145+
argument, with the following behaviour:
146+
147+
$ goto-instrument --greet main
148+
hello, world!
149+
$ goto-instrument --greet Leperina main
150+
hello, Leperina!
151+
152+
You will also need to add the `greet` option to the
153+
`goto_instrument_parse_options.h` file in order for this to work.
154+
Notice that in the `.h` file, options that take an argument are followed
155+
by a colon (like `(property):`), while simple switches have no colon.
156+
Make sure that you `return 0;` after printing the message.
157+
</div>
158+
159+
The idea behind `goto-instrument` is that it parses a goto-program and
160+
then performs one single analysis on that goto-program, and then
161+
returns. Each of the switches in `doit` function of
162+
`goto_instrument_parse_options` does something different with the
163+
goto-program that was supplied on the command line.
164+
165+
166+
### Goto-program basics
167+
168+
At this point in `goto-instrument_parse_options` (where the `if`
169+
statements are), the goto-program will have been loaded into the object
170+
`goto_functions`, of type `goto_functionst`. This has a field called
171+
`function_map`, a map from function names to functions.
172+
173+
174+
<div class="memdoc">
175+
**Task:** Add a `--print-function-names` switch to `goto-instrument`
176+
that prints out the name of every function in the goto-binary. Are
177+
there any functions that you didn't expect to see?
178+
</div>
179+
180+
The following is quite difficult to follow from doxygen, but: the value
181+
type of `function_map` is `goto_function_templatet<goto_programt>`.
182+
183+
184+
<div class=memdoc>
185+
**Task:** Read the documentation for `goto_function_templatet<bodyT>`
186+
and `goto_programt`.
187+
</div>
188+
189+
Each goto_programt object contains a list of
190+
\ref goto_program_templatet::instructiont called
191+
`instructions`. Each instruction has a field called `code`, which has
192+
type \ref codet.
193+
194+
<div class=memdoc>
195+
**Task:** Add a `--pretty-program` switch to `goto-instrument`. This
196+
switch should use the `codet::pretty()` function to pretty-print every
197+
\ref codet in the entire program. The strings that `pretty()` generates
198+
for a codet look like this:
199+
200+
index
201+
* type: unsignedbv
202+
* width: 8
203+
* #c_type: char
204+
0: symbol
205+
* type: array
206+
* size: nil
207+
* type:
208+
* #source_location:
209+
* file: src/main.c
210+
* line: 18
211+
* function:
212+
* working_directory: /some/dir
213+
0: unsignedbv
214+
* width: 8
215+
* #c_type: char
216+
...
217+
</div>
218+
219+
The sub-nodes of a particular node in the pretty representation are
220+
numbered, starting from 0. They can be accessed through the `op0()`,
221+
`op1()` and `op2()` methods in the `exprt` class.
222+
223+
Every node in the pretty representation has an identifier, accessed
224+
through the `id()` function. The file `util/irep_ids.def` lists the
225+
possible values of these identifiers; have a quick scan through that
226+
file. In the pretty representation above, the following facts are true
227+
of that particular node:
228+
229+
- `node.id() == ID_index`
230+
- `node.type().id() == ID_unsignedbv`
231+
- `node.op0().id() == ID_symbol`
232+
- `node.op0().type().id() == ID_array`
233+
234+
The fact that the `op0()` child has a `symbol` ID menas that you could
235+
cast it to a `symbol_exprt` (which is a subtype of `exprt`) using the
236+
function `to_symbol_expr`.
237+
238+
<div class=memdoc>
239+
**Task:** Add flags to `goto-instrument` to print out the following information:
240+
* the name of every function that is *called* in the program;
241+
* the value of every constant in the program;
242+
* the value of every symbol in the program.
243+
</div>

0 commit comments

Comments
 (0)