Skip to content

DOC-43: Documentation of GOTO model (de)serialisation. #2802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 5, 2018

Conversation

marek-trtik
Copy link
Contributor

No description provided.

The content of the written stream will have this strucutre:
- The header:
- A magic number: byte `0x7f` followed by 3 characters `GBF`.
- A version number written in the 7-bit encoding. Currently, only version `4` is supported.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could a link explaining "7-bit encoding" be provided? It's used a lot below.

- The string `s.mode`.
- The string `s.pretty_name`.
- The number `0` in the 7-bit encoding.
- The flags word in the 7-bit encoding. The bits in the flags word correspond to the following `bool` fields (from the most significant bit):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use "Boolean" instead of "bool".

- The array of individual function with bodies. Each written function has this structure:
- The string with the name of the function.
- The number of instructions in the body of the function in the 7-bit encoding.
- The array of individual instruction in function's body. Each written instruction `I` has this structure:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

individual instruction_s_


An important propery of the serialisation is that each serialised `irept`
instance occurs in the stream exactly once. Namely, in the position of
its first serialisation query. All other such queries save only hash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only a hash

the [previous subsection](\ref subsection-goto-binary-serialisation).
The process of the deserialisation does not involve any seeking in the file.
The content is read linearly from the beginning to the end. `irept` instances
and their string constats are deserialised into the memory only once at their
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/constats/constants/

The process of the deserialisation does not involve any seeking in the file.
The content is read linearly from the beginning to the end. `irept` instances
and their string constats are deserialised into the memory only once at their
first occurreces in the stream. All subsequent deserialisation queries are
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/occurreces/occurrence/

- `elf_reader.h`
- `elf_reader.cpp`

\subsubsection subsection-goto-binary-deserialisation-from-mac-o-fat-image Deserialisation from Mac-O fat image
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's "mach-o", not "mac-o" (multiple instances below as well)

\subsubsection irep-serialization-numbers Serialization of Numbers

A number is serialiased in 7-bit encoding. For example. given a 2-byte
number in base 2, like `10101010 01010101`, then it is save in 3 bytes,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is is save_d_

number in base 2, like `10101010 01010101`, then it is save in 3 bytes,
where each byte takes only 7 bits from the number, reading from the
left. The 8th bit in each output byte is set to 1 except in the last
byte, where the bit is 0. That 0 bit indicates the end of the end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate "the end"

@@ -377,3 +377,181 @@ This stage concludes the *analysis-independent* program transformations.
\subsubsection goto-program-example-1-section Unsigned mult (unsigned a, unsigned b) { int acc, i; for (i = 0; i < b; i++) acc += a; return acc; }

To be documented.


\section section-goto-binary Binary Represenration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Representation


To serialise a `goto_modelt` instance `gm` to a stream `ostr` call the function `write_goto_binary`, e.g. `write_goto_binary(ostr, gm)`.

The content of the written stream will have this strucutre:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

structure

its first serialisation query. All other such queries save only hash
code (i.e. reference) of the `irept` instance.

The similar strategy is used for serialisation of string constants
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar

The similar strategy is used for serialisation of string constants
shared amongst `irept` instances. Such a string is fully saved only in
the first serialisation query of an `irept` instance it appears in and
all other queries only save its integer has code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as code?

- `read_bin_goto_object.h`
- `read_bin_goto_object.cpp`

First two modules are responsible for location of the stream with the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first two


First two modules are responsible for location of the stream with the
serialised data within a passed file. And the remaining two modules
perform the actual deserialisation of `goto_modelt` instance from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of a

\subsubsection subsection-goto-binary-is-binary-file Checking file type

You can use function `is_goto_binary` to check whether a passed file contains
a sederialised `goto_modelt` instance or not. This is done by checking the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deserialised

characters `0` and `\\` are escaped by writing additional `\\`
before them.

This is implmented in the function `write_gb_string` and the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefix function with class name so that it is linked by doxygen

@marek-trtik
Copy link
Contributor Author

Requested updates are available.

@peterschrammel
Copy link
Member

@marek-trtik, please rebase.

@marek-trtik
Copy link
Contributor Author

@peterschrammel Done.

@tautschnig
Copy link
Collaborator

@marek-trtik Would you mind squashing your commits? Then this one should be ready to be merged.

@marek-trtik
Copy link
Contributor Author

@tautschnig Done.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR failed Diffblue compatibility checks (cbmc commit: 6cd614b).
Status will be re-evaluated on next push.
Please contact @peterschrammel, @thk123, or @allredj for support.

Common spurious failures:

  • the cbmc commit has disappeared in the mean time (e.g. in a force-push)
  • the author is not in the list of contributors (e.g. first-time contributors).

@tautschnig tautschnig merged commit a6d2e2a into diffblue:develop Sep 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants