Skip to content

Improvements to goto-analyzer's constant propagator and simplifier [blocks: #2522] #2132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Dec 19, 2018

Conversation

tautschnig
Copy link
Collaborator

@tautschnig tautschnig commented Apr 27, 2018

Previously only exprt::id() == ID_constant was deemed "constant", but several other expressions (in particular: the address of an object) are also constants that can safely be propagated. The interpretation of what is (or isn't) constant now largely overlaps with what goto-symex considers a constant.

Dependencies:

@smowton
Copy link
Contributor

smowton commented Apr 30, 2018

Looks sensible, but there ought to be corresponding new tests for the new kinds of propagation we intend to support.

Copy link
Contributor

@thk123 thk123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @smowton this needs some tests - also dunno how much overlap there is with this + variable sensitivity domain enhancements to constant propagation @chrisr-diffblue

@@ -72,4 +72,7 @@ bool has_subtype(const typet &, const irep_idt &id, const namespacet &);
/// lift up an if_exprt one level
if_exprt lift_if(const exprt &, std::size_t operand_number);

/// returns true if the expression can be used as an lvalue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as I hate having to add the comment, the coding standards say docs should be on the implementation.

@@ -80,7 +81,7 @@ bool replace_symbolt::replace(
{
const exprt &e=it->second;

if(!replace_with_const && e.is_constant()) // Would give non l-value.
if(!replace_with_const && !is_lvalue(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't belong in this commit as this is a non-trivial change,

Previously required replace_with_const == false AND e.id() == ID_constant. Now requires replace_with_const == false AND a super set of the original possible es.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like @thk123 I'm reviewing this patch by patch and yes, there is an issue here. I'm happy to resolve it by seeing where else in the patch set it would fit rather than moving it but still.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a separate PR now: #2200.

expr.id() == ID_plus || expr.id() == ID_array || expr.id() == ID_struct ||
expr.id() == ID_union)
{
forall_operands(it, expr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought forall_operands was now deprecated? Could use:

return std::all_of(
  expr.operands().begin(), 
  expr.operands().end(), 
  [](const exprt &) { return is_constant(expr); });

To make the condition more explicit

{
return is_constant(expr.op0());
}
else if(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ID_minus, ID_multiply etc? Or am I missing something

@tautschnig tautschnig force-pushed the extend-simplifier branch from 197f76d to 1abe91a Compare May 1, 2018 19:01
@tautschnig tautschnig self-assigned this May 2, 2018
@tautschnig tautschnig force-pushed the extend-simplifier branch from 1abe91a to 626804b Compare May 16, 2018 16:35
@tautschnig
Copy link
Collaborator Author

Tests have now been added - without adding any new tests...! I have gone through various regression tests and moved from FUTURE to CORE those tests that the individual commits now render successful.

@tautschnig tautschnig force-pushed the extend-simplifier branch from 626804b to 7ce5312 Compare May 16, 2018 21:58
Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few changes but more than that, I'd like to talk about the patch that loops calls to AI.

^Simplified: assert: 1, assume: 0, goto: 3, assigns: 4, function calls: 0$
^Unmodified: assert: 0, assume: 0, goto: 0, assigns: 11, function calls: 2$
^Simplified: assert: 1, assume: 0, goto: 1, assigns: 6, function calls: 0$
^Unmodified: assert: 0, assume: 0, goto: 1, assigns: 9, function calls: 2$
--
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this change matches the commit message of the commit it is in but I guess it is being updated elsewhere in the patch set, in which case it's a bit unnecessary to change it.

@@ -80,7 +81,7 @@ bool replace_symbolt::replace(
{
const exprt &e=it->second;

if(!replace_with_const && e.is_constant()) // Would give non l-value.
if(!replace_with_const && !is_lvalue(e))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like @thk123 I'm reviewing this patch by patch and yes, there is an issue here. I'm happy to resolve it by seeing where else in the patch set it would fit rather than moving it but still.

to_side_effect_expr(expr).get_statement()==ID_nondet)
return false;
if(expr.is_constant())
return true;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way we could share the code between this and symex rather than keeping the two maintained in parallel? expr_util.h?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give this a try, even though symex has some peculiarities around plus and multiplication just so as to feed more information to calls to malloc.

@@ -18,7 +18,7 @@ class optionst;

bool static_simplifier(
goto_modelt &,
const ai_baset &,
ai_baset &,
const optionst &,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I think we need to talk about this one.


return b;
while(!values.replace_const.replace(condition))
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test case for this? As in; do you know where it is needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's needed when local constant propagation yields a subexpression that is not an lvalue, as in the following:

int x = 1;
int *p = &x;
x = *p;

Now the first substitution+simplification will be x = x; when only the second round will yield x = 1;.

" \nhavoc[,params:<regex>][,globals:<regex>]\n" \
" nondet-return, assert-false-assume-false " \
"and\n" \
" havoc[,params:<regex>][,globals:<regex>]\n" \
" (default: nondet-return)"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated, but, yes.

@@ -220,6 +221,10 @@ bool static_simplifier(
<< ", function calls: " << unmodified.function_calls
<< messaget::eom;

// restore return types before writing the binary
restore_returns(goto_model);
goto_model.goto_functions.update();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I know I keep saying this BUT we (in the broadest sense) do need to work out what the valid forms of goto-program are at various points in the tool. This reads to me a little bit like "returns must always be present when on disk".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not as much about returns, but more about the types: I don't think the simplifier should yield a binary where symbols have different types. Or at least that's not what I'd expect from a "simplifier".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair.

"to end of function");
i_it=goto_program.instructions.insert(i_it, *i_it);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again; this looks like another "we need to clarify our invariants" issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, though in this case that was code I had originally corrupted^W^Wwritten with fixed assumptions about its structure. Those assumptions were unnecessary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fair.

@martin-cs
Copy link
Collaborator

I have two things against the looped calls to AI patch (the rest I'm happy to accept (although having the constant detection shared rather than duplicated would be awesome)):

  1. It rather breaks the idea of goto-analyze as "run AI; then do task". That's not the end of the world but it would be nice and more specifically I'd want a good reason for it. Which brings me to...

  2. Isn't this just papering over the cracks? I think simplify should be idempotent. I think it not being is indicative of a bug / lack of precision. @hannes-steffenhagen-diffblue very kindly did some work on this with the variable sensitivity domain and found a number of issues. So I'm kind of inclined to look at the cases where repeated simplification helps and see if they are actually issues with the domain / simplification code.

@martin-cs
Copy link
Collaborator

( I realise this week I am "block and discuss" on every patch but ... )

@tautschnig
Copy link
Collaborator Author

( I realise this week I am "block and discuss" on every patch but ... )

I do strongly prefer to have good quality code so all those inputs are highly appreciated! So ...

  1. Isn't this just papering over the cracks? I think simplify should be idempotent. I think it not being is indicative of a bug / lack of precision. @hannes-steffenhagen-diffblue very kindly did some work on this with the variable sensitivity domain and found a number of issues. So I'm kind of inclined to look at the cases where repeated simplification helps and see if they are actually issues with the domain / simplification code.

I don't think it's as simple as that; in fact simplify, as far as I can tell, is idempotent. There might be some bugs left, but that's not what this looping is about. The key bit is re-running constant propagation (or indeed whatever domain) after having done simplification. Consider a reduced version of one of the regression tests:

int i = 0;
if(i == 0) ++i;
assert(i == 1);

The first iteration of constant propagation will determine the value of i for all instructions in line 2, but cannot resolve the assertion, because there still is a join point leaving the value of i non-constant at that point. Thus the result of the first run is

int i = 0;
i = 1;
assert(i == 1);

Now the branching is gone, and constant propagation plus simplification can happily sort out the assertion. I am all ears to hear about better solutions, but from my point of view a fixed point around the combination of abstraction interpretation and simplification is required here.

@martin-cs
Copy link
Collaborator

martin-cs commented May 17, 2018 via email

@martin-cs
Copy link
Collaborator

martin-cs commented May 17, 2018 via email

@tautschnig
Copy link
Collaborator Author

[...]

Consider a reduced version of one of the regression tests:

int i = 0;
if(i == 0) ++i;
assert(i == 1);

The first iteration of constant propagation will determine the value of i for all instructions in line 2, but cannot resolve the assertion, because there still is a join point leaving the value of i non-constant at that point.

I see that as a precision issue. I think only one branch of the if
statement should be taken because i == 0 can be resolved by the
domain. Thus one side will be \bot and so the assertion can be reduced
to assert(true).

Ok, but that's not what the domain is doing right now? So it's a but in the constant-propagation domain?

Thus the result of the first run is

int i = 0;
i = 1;
assert(i == 1);

Now the branching is gone, and constant propagation plus simplification can happily sort out the assertion. I am all ears to hear about better solutions, but from my point of view a fixed point around the combination of abstraction interpretation and simplification is required here.

Do you have examples of this causing a problem? My next real
development task is finishing off goto-analyze and the variable
sensitivity domain. It is my belief that all cases of needing multiple
iterations I can fix with the VSD. Let's give it a go and see if we can
solve this by finding real programs that have this issue.

Yes, I do (as otherwise admittedly wouldn't have touched this code...). My workflow includes using the static unwinder that @danpoe built before applying the simplifier. I'm all ears if there is a better solution. I might actually filter out the undisputed bug fixes from this PR into a new one so that those can go in in the meantime.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫
This PR failed Diffblue compatibility checks (cbmc commit: d07507b).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/91427232
Status will be re-evaluated on next push.
Please contact @peterschrammel, @thk123, or @allredj for support.

Common spurious failures:

  • the cbmc commit has disappeared in the mean time (e.g. in a force-push)
  • the author is not in the list of contributors (e.g. first-time contributors).

The incompatibility may have been introduced by an earlier PR. In that case merging this
PR should be avoided unless it fixes the current incompatibility.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 03dfcb0).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/91431166

Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- any thoughts on how this interacts with the improvements in #2522 ?

@tautschnig
Copy link
Collaborator Author

any thoughts on how this interacts with the improvements in #2522 ?

Let me comment on #2522, many thanks for refreshing my memory on that.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 9b16c48).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/91505755

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; looks much better.

@@ -1,4 +1,4 @@
KNOWNBUG
FUTURE
main.c
--variable --pointers --arrays --structs --verify
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be future, also, the names for options should change.

main.c
--constants --simplify out.gb
--constants --verify
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has this changed task?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness: I have changed the task here as simplification did always take place, but not enough of it to actually turn the assertion into a firm FALSE.

main.c
--constants --simplify out.gb
--constants --verify
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, why change task?

@@ -1,9 +1,9 @@
FUTURE
KNOWNBUG
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change, although not wrong, doesn't seem to match the commit message.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 731c3bf).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/93273597

This option is not currently supported, there is no known bug here.
Several tests already pass without any code modifications.
The "values" parameter shadowed the class member values, which, however, was
used by functions called by assign_rec. Thus they possibly operate on two
different sets of values.
It has only been disabled on suspicion of being at fault for bugs, but these are
quite possibly due to other bugs (cf. soundness fixes later in this series).
The included regression test is an example the requires interleaving of
simplification and constant propagation.
1. Properly handle index, member, or dereference expressions on the left-hand
side of assignments.
2. If we find something on the left-hand side that we don't support (maybe byte
extracts, if-expressions, type casts, or maybe something else), set the entire
value set to top to ensure soundness.
We still fall back to setting values to top if dereferencing fails.
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 2359689).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/95302081

@tautschnig tautschnig merged commit 053ae9c into diffblue:develop Dec 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants