|
| 1 | +\ingroup module_hidden |
| 2 | +\page cbmc-hacking CBMC Hacking HOWTO |
| 3 | + |
| 4 | +\author Kareem Khazem |
| 5 | + |
| 6 | +This is an introduction to hacking on the `cprover` codebase. It is not |
| 7 | +intended as a user guide to `CBMC` or related tools. It is structured |
| 8 | +as a series of programming exercises that aim to acclimatise the reader |
| 9 | +to the basic data structures and workflow needed for contributing to |
| 10 | +`CBMC`. |
| 11 | + |
| 12 | + |
| 13 | +## Initial setup |
| 14 | + |
| 15 | +Clone the [CBMC repository][cbmc-repo] and build it: |
| 16 | + |
| 17 | + git clone https://github.com/diffblue/cbmc.git |
| 18 | + cd cbmc/src |
| 19 | + make minisat2-download |
| 20 | + make |
| 21 | + |
| 22 | +Ensure that [graphviz][graphviz] is installed on your system (in |
| 23 | +particular, you should be able to run a program called `dot`). Install |
| 24 | +[Doxygen][doxygen] and generate doxygen documentation: |
| 25 | + |
| 26 | + # In the src directory |
| 27 | + doxygen doxyfile |
| 28 | + # View the documentation in a web browser |
| 29 | + firefox doxy/html/index.html |
| 30 | + |
| 31 | +If you've never used doxygen documentation before, get familiar with the |
| 32 | +layout. Open the generated HTML page in a web browser; search for the |
| 33 | +class `goto_programt` in the search bar, and jump to the documentation |
| 34 | +for that class; and read through the copious documentation. |
| 35 | + |
| 36 | +The build writes executable programs into several of the source |
| 37 | +directories. In this tutorial, we'll be using binaries inside the |
| 38 | +`cbmc`, `goto-instrument`, and `goto-cc` directories. Add these |
| 39 | +directories to your `$PATH`: |
| 40 | + |
| 41 | + # Assuming you cloned CBMC into ~/code |
| 42 | + export PATH=~/code/cbmc/src/goto-instrument:~/code/cbmc/src/goto-cc:~/code/cbmc/src/cbmc:$PATH |
| 43 | + # Add to your shell's startup configuration file so that you don't have to run that command every time. |
| 44 | + echo 'export PATH=~/code/cbmc/src/goto-instrument:~/code/cbmc/src/goto-cc:~/code/cbmc/src/cbmc:$PATH' >> .bashrc |
| 45 | + |
| 46 | +Optional: install an image viewer that can read images on stdin. |
| 47 | +I use [feh][feh]. |
| 48 | + |
| 49 | +[cbmc-repo]: https://github.com/diffblue/cbmc/ |
| 50 | +[doxygen]: http://www.stack.nl/~dimitri/doxygen/ |
| 51 | +[graphviz]: http://www.graphviz.org/ |
| 52 | +[feh]: https://feh.finalrewind.org/ |
| 53 | + |
| 54 | + |
| 55 | + |
| 56 | +## Whirlwind tour of the tools |
| 57 | + |
| 58 | +CBMC's code is located under the `cbmc` directory. Even if you plan to |
| 59 | +contribute only to CBMC, it is important to be familiar with several |
| 60 | +other of cprover's auxiliary tools. |
| 61 | + |
| 62 | + |
| 63 | +### Compiling with `goto-cc` |
| 64 | + |
| 65 | +There should be an executable file called `goto-cc` in the `goto-cc` |
| 66 | +directory; make a symbolic link to it called `goto-gcc`: |
| 67 | + |
| 68 | + cd cbmc/src/goto-cc |
| 69 | + ln -s "$(pwd)/goto-cc" goto-gcc |
| 70 | + |
| 71 | +Find or write a moderately-interesting C program; we'll call it `main.c`. |
| 72 | +Run the following commands: |
| 73 | + |
| 74 | + goto-gcc -o main.goto main.c |
| 75 | + cc -o main.exe main.c |
| 76 | + |
| 77 | +Invoke `./main.goto` and `./main.exe` and observe that they run identically. |
| 78 | +The version that was compiled with `goto-gcc` is larger, though: |
| 79 | + |
| 80 | + du -hs *.{goto,exe} |
| 81 | + |
| 82 | +Programs compiled with `goto-gcc` are mostly identical to their `clang`- |
| 83 | +or `gcc`-compiled counterparts, but contain additional object code in |
| 84 | +cprover's intermediate representation. The intermediate representation |
| 85 | +is (informally) called a *goto-program*. |
| 86 | + |
| 87 | + |
| 88 | +### Viewing goto-programs |
| 89 | + |
| 90 | +`goto-instrument` is a Swiss army knife for viewing goto-programs and |
| 91 | +performing single program analyses on them. Run the following command: |
| 92 | + |
| 93 | + goto-instrument --show-goto-functions main.goto |
| 94 | + |
| 95 | +Many of the instructions in the goto-program intermediate representation |
| 96 | +are similar to their C counterparts. `if` and `goto` statements replace |
| 97 | +structured programming constructs. |
| 98 | + |
| 99 | +Find or write a small C program (2 or 3 functions, each containing a few |
| 100 | +varied statements). Compile it using `goto-gcc` as above into an object |
| 101 | +file called `main`. If you installed `feh`, try the following command |
| 102 | +to dump a control-flow graph: |
| 103 | + |
| 104 | + goto-instrument --dot main | tail -n +2 | dot -Tpng | feh - |
| 105 | + |
| 106 | +If you didn't install `feh`, you can write the diagram to the file and |
| 107 | +then view it: |
| 108 | + |
| 109 | + goto-instrument --dot main | tail -n +2 | dot -Tpng > main.png |
| 110 | + Now open main.png with an image viewer |
| 111 | + |
| 112 | +(the invocation of `tail` is used to filter out the first line of |
| 113 | +`goto-instrument` output. If `goto-instrument` writes more or less |
| 114 | +debug output by the time you read this, read the output of |
| 115 | +`goto-instrument --dot main` and change the invocation of `tail` |
| 116 | +accordingly.) |
| 117 | + |
| 118 | +There are a few other views of goto-programs. Run `goto-instrument -h` |
| 119 | +and try the various switches under the "Diagnosis" section. |
| 120 | + |
| 121 | + |
| 122 | + |
| 123 | +## Learning about goto-programs |
| 124 | + |
| 125 | +In this section, you will learn about the basic goto-program data |
| 126 | +structures. Reading from and manipulating these data structures form |
| 127 | +the core of writing an analysis for CBMC. |
| 128 | + |
| 129 | + |
| 130 | +### First steps with `goto-instrument` |
| 131 | + |
| 132 | +<div class=memdoc> |
| 133 | +**Task:** Write a simple C program with a few functions, each containing |
| 134 | +a few statements. Compile the program with `goto-gcc` into a binary |
| 135 | +called `main`. |
| 136 | +</div> |
| 137 | + |
| 138 | + |
| 139 | +The entry point of `goto-instrument` is in `goto_instrument_main.cpp`. |
| 140 | +Follow the control flow into `goto_instrument_parse_optionst::doit()`, located in `goto_instrument_parse_options.cpp`. |
| 141 | +At some point in that function, there will be a long sequence of `if` statements. |
| 142 | + |
| 143 | +<div class=memdoc> |
| 144 | +**Task:** Add a `--greet` switch to `goto-instrument`, taking an optional |
| 145 | +argument, with the following behaviour: |
| 146 | + |
| 147 | + $ goto-instrument --greet main |
| 148 | + hello, world! |
| 149 | + $ goto-instrument --greet Leperina main |
| 150 | + hello, Leperina! |
| 151 | + |
| 152 | +You will also need to add the `greet` option to the |
| 153 | +`goto_instrument_parse_options.h` file in order for this to work. |
| 154 | +Notice that in the `.h` file, options that take an argument are followed |
| 155 | +by a colon (like `(property):`), while simple switches have no colon. |
| 156 | +Make sure that you `return 0;` after printing the message. |
| 157 | +</div> |
| 158 | + |
| 159 | +The idea behind `goto-instrument` is that it parses a goto-program and |
| 160 | +then performs one single analysis on that goto-program, and then |
| 161 | +returns. Each of the switches in `doit` function of |
| 162 | +`goto_instrument_parse_options` does something different with the |
| 163 | +goto-program that was supplied on the command line. |
| 164 | + |
| 165 | + |
| 166 | +### Goto-program basics |
| 167 | + |
| 168 | +At this point in `goto-instrument_parse_options` (where the `if` |
| 169 | +statements are), the goto-program will have been loaded into the object |
| 170 | +`goto_functions`, of type `goto_functionst`. This has a field called |
| 171 | +`function_map`, a map from function names to functions. |
| 172 | + |
| 173 | + |
| 174 | +<div class="memdoc"> |
| 175 | +**Task:** Add a `--print-function-names` switch to `goto-instrument` |
| 176 | +that prints out the name of every function in the goto-binary. Are |
| 177 | +there any functions that you didn't expect to see? |
| 178 | +</div> |
| 179 | + |
| 180 | +The following is quite difficult to follow from doxygen, but: the value |
| 181 | +type of `function_map` is `goto_function_templatet<goto_programt>`. |
| 182 | + |
| 183 | + |
| 184 | +<div class=memdoc> |
| 185 | +**Task:** Read the documentation for `goto_function_templatet<bodyT>` |
| 186 | +and `goto_programt`. |
| 187 | +</div> |
| 188 | + |
| 189 | +Each goto_programt object contains a list of |
| 190 | +\ref goto_program_templatet::instructiont called |
| 191 | +`instructions`. Each instruction has a field called `code`, which has |
| 192 | +type \ref codet. |
| 193 | + |
| 194 | +<div class=memdoc> |
| 195 | +**Task:** Add a `--pretty-program` switch to `goto-instrument`. This |
| 196 | +switch should use the `codet::pretty()` function to pretty-print every |
| 197 | +\ref codet in the entire program. The strings that `pretty()` generates |
| 198 | +for a codet look like this: |
| 199 | + |
| 200 | + index |
| 201 | + * type: unsignedbv |
| 202 | + * width: 8 |
| 203 | + * #c_type: char |
| 204 | + 0: symbol |
| 205 | + * type: array |
| 206 | + * size: nil |
| 207 | + * type: |
| 208 | + * #source_location: |
| 209 | + * file: src/main.c |
| 210 | + * line: 18 |
| 211 | + * function: |
| 212 | + * working_directory: /some/dir |
| 213 | + 0: unsignedbv |
| 214 | + * width: 8 |
| 215 | + * #c_type: char |
| 216 | + ... |
| 217 | +</div> |
| 218 | + |
| 219 | +The sub-nodes of a particular node in the pretty representation are |
| 220 | +numbered, starting from 0. They can be accessed through the `op0()`, |
| 221 | +`op1()` and `op2()` methods in the `exprt` class. |
| 222 | + |
| 223 | +Every node in the pretty representation has an identifier, accessed |
| 224 | +through the `id()` function. The file `util/irep_ids.def` lists the |
| 225 | +possible values of these identifiers; have a quick scan through that |
| 226 | +file. In the pretty representation above, the following facts are true |
| 227 | +of that particular node: |
| 228 | + |
| 229 | + - `node.id() == ID_index` |
| 230 | + - `node.type().id() == ID_unsignedbv` |
| 231 | + - `node.op0().id() == ID_symbol` |
| 232 | + - `node.op0().type().id() == ID_array` |
| 233 | + |
| 234 | +The fact that the `op0()` child has a `symbol` ID menas that you could |
| 235 | +cast it to a `symbol_exprt` (which is a subtype of `exprt`) using the |
| 236 | +function `to_symbol_expr`. |
| 237 | + |
| 238 | +<div class=memdoc> |
| 239 | +**Task:** Add flags to `goto-instrument` to print out the following information: |
| 240 | +* the name of every function that is *called* in the program; |
| 241 | +* the value of every constant in the program; |
| 242 | +* the value of every symbol in the program. |
| 243 | +</div> |
0 commit comments