Skip to content

Fix handling of DEAD instructions and function call inlining #6473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 30, 2021

Conversation

SaswatPadhi
Copy link
Contributor

@SaswatPadhi SaswatPadhi commented Nov 24, 2021

In this PR, we fix 3 minor issues related to code contracts:

  1. an overflow error in handling DEAD instructions during write set inclusion checking.
  2. refactor and optimize function call inlining during code contracts instrumentation
  3. extend write set inclusion checks to loop head instructions (in addition to loop body) to check for possible side effects in the guard
  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@SaswatPadhi SaswatPadhi added bugfix aws Bugs or features of importance to AWS CBMC users aws-high Code Contracts Function and loop contracts labels Nov 24, 2021
@SaswatPadhi SaswatPadhi self-assigned this Nov 24, 2021
@codecov
Copy link

codecov bot commented Nov 24, 2021

Codecov Report

Merging #6473 (abd36cf) into develop (2a9e3e2) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #6473   +/-   ##
========================================
  Coverage    76.02%   76.02%           
========================================
  Files         1546     1546           
  Lines       165352   165359    +7     
========================================
+ Hits        125711   125717    +6     
- Misses       39641    39642    +1     
Impacted Files Coverage Δ
src/goto-instrument/contracts/contracts.cpp 97.10% <100.00%> (+0.35%) ⬆️
src/goto-instrument/loop_utils.cpp 82.85% <0.00%> (-8.58%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6b3118...abd36cf. Read the comment docs.

When handling DEAD instructions in function/loop assigns clause
instrumentation, the instruction iterator was not being incremented
correctly.
This led to instrumentation outside of the function/loop scope, and
spurious write set inclusion violations.

Moreover, for loops, declarations of some temporaries (those involved in
the "initialization" of loop counter, for instance) are "outside" of the
loop identified by CBMC, so we no longer raise an exception on not
finding a corresponding DECL for each DEAD.
These variables are writable because they appear as assigns clause
targets, not because they were local DECLs.
Function call inlining was earlier performed on the same function
multiple times if it had multiple loops.
We refactor the inlining call out and only inline the function once,
even if it has multiple loops.
Previously, only loop body was being instrumented for write set
inclusion checks. This could miss checking writes performed by the loop
guard, if it has side effects.
In this PR, we also instrument the loop guard with inclusion checks.
Copy link
Collaborator

@remi-delmas-3000 remi-delmas-3000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing some tests with actual side effects in the guard evaluation, and unresolved issue with decreases clause evaluation

// FIXME: This simple approach wouldn't work when
// the loop guard in the source file is split across multiple lines.
const auto head_loc = loop_head->source_location();
while(loop_head->source_location() == head_loc)
while(loop_head->source_location() == head_loc ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fragile. An alternative solution could be to have the C front end inject LOCATION markers to precisely delimit loop entry, loop guard evaluation, loop condition test, loop body, loop step instructions, jump back to head instructions.

Copy link
Contributor Author

@SaswatPadhi SaswatPadhi Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @martin-cs suggested something similar (keeping some markers around), in my other PR.

I would suggest not to fix this this in the same PR especially because it would touch the C front end (parsing functions etc.) This PR is tiny and it only fixes the assigns clause issues (with DEAD) and makes it play well with function call inlining (modulo the existing issue with loop body finding).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with both. I have already had Opinions at @SaswatPadhi about loop structure detection (@remi-delmas-3000 did I CC you? If not I can resend. ); this is fragile and is asking for trouble and inconsistency with the rest of CPROVER. I think this functionality should be factored out and used in all appropriate places.

But I also agree with @SaswatPadhi ; that is outside of the scope of this PR.


void main()
{
for(int i = lowerbound(); i < upperbound(); incr(&i))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we also add another simple test such as

int main()
{
   unsigned int max;
   assume(max > 0);
   unsigned int i=0;  
   while( i < max++)
   loop_decreases(max-i)
   {
     i++;
   }
}

This loop's decreases clause is incorrect, max and i can both overflow
(but the loop still terminates when max overflows and becomes smaller than i or stays at max_int and i starts catching up on it)
We should be able to find all these errors.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that you should be able to find these and agreed that this would be worth adding as a test.
Not entirely sure how this connects to the specific issue being fixed though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a separate issue from the one addressed in this PR. @remi-delmas-3000 is there an issue where we track this?

Comment on lines +688 to +693
if(!natural_loops.loop_map.size())
return;

goto_function_inline(
goto_functions, function_name, ns, log.get_message_handler());

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inlining after loop detection will miss loops that are hidden behind function calls that do not have contracts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is okay? because they don't have contracts anyway?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't object but do note that this kind of inlining can get expensive.

Comment on lines +1090 to +1093
log.warning() << "Found a `DEAD` variable "
<< name2string(symbol.get_identifier())
<< " without corresponding `DECL`, at: "
<< instruction_it->source_location() << messaget::eom;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to warn about constructs the user has no control over ? How could we distinguish such expected cases from really incorrect cases ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably use log.debug or log.info, yes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I am not sure it is "logging at run-time to the user" issue, in part because, as @remi-delmas-3000 says, it is not clear what the user is supposed to do with this information.

This feels like a DATA_INVARIANT of a goto-programs; you can only DEAD a variable that you have previously DECLd. This seems reasonable and feels like the kind of thing that should be in the --validate-goto-model checks.

@SaswatPadhi
Copy link
Contributor Author

Thanks for taking a look, @remi-delmas-3000

missing some tests with actual side effects in the guard evaluation, and unresolved issue with decreases clause evaluation

Guards with side effects were out of scope until now. This PR only changes assigns clause instrumentation to happen on the entire loop that CBMC identifies, rather than just the body. We haven't tested loops with guards that have side effects though, and there could be other issues with the invariant assumption / havoc statement / base case assertion placement, that we might want to fix before we can get decreases clauses working.

I would suggest keeping this PR only about assigns clause fixes (both function & loop contracts) and making separate PRs each for fixing issues with loops that with guards that have side effects, handling do/while loops etc.

@remi-delmas-3000
Copy link
Collaborator

approving to merge but we need to keep in mind the issues mentioned in my review

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not going to block this over the logging but please could you address it?

( Also thanks for the really well split-out changes and clear commit messages, makes it so much easier to read. )


void main()
{
for(int i = lowerbound(); i < upperbound(); incr(&i))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that you should be able to find these and agreed that this would be worth adding as a test.
Not entirely sure how this connects to the specific issue being fixed though.

Comment on lines +1090 to +1093
log.warning() << "Found a `DEAD` variable "
<< name2string(symbol.get_identifier())
<< " without corresponding `DECL`, at: "
<< instruction_it->source_location() << messaget::eom;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I am not sure it is "logging at run-time to the user" issue, in part because, as @remi-delmas-3000 says, it is not clear what the user is supposed to do with this information.

This feels like a DATA_INVARIANT of a goto-programs; you can only DEAD a variable that you have previously DECLd. This seems reasonable and feels like the kind of thing that should be in the --validate-goto-model checks.

// FIXME: This simple approach wouldn't work when
// the loop guard in the source file is split across multiple lines.
const auto head_loc = loop_head->source_location();
while(loop_head->source_location() == head_loc)
while(loop_head->source_location() == head_loc ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with both. I have already had Opinions at @SaswatPadhi about loop structure detection (@remi-delmas-3000 did I CC you? If not I can resend. ); this is fragile and is asking for trouble and inconsistency with the rest of CPROVER. I think this functionality should be factored out and used in all appropriate places.

But I also agree with @SaswatPadhi ; that is outside of the scope of this PR.

Comment on lines +688 to +693
if(!natural_loops.loop_map.size())
return;

goto_function_inline(
goto_functions, function_name, ns, log.get_message_handler());

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't object but do note that this kind of inlining can get expensive.


void main()
{
for(int i = lowerbound(); i < upperbound(); incr(&i))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a separate issue from the one addressed in this PR. @remi-delmas-3000 is there an issue where we track this?

@feliperodri feliperodri merged commit 1a1245e into diffblue:develop Nov 30, 2021
Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was too late to the party: I'll create a follow-up PR to fix the issues I called out.

// they must appear as assigns targets anyway,
// but their DECL statements are outside of the loop.
log.warning() << "Found a `DEAD` variable "
<< name2string(symbol.get_identifier())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name2string seems to be rarely used (to the extent that I'm wondering whether we should get rid of it). I'd suggest you use id2string instead, though here you actually don't need either!

return;

goto_function_inline(
goto_functions, function_name, ns, log.get_message_handler());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline: calls to goto_function_inline will require goto_functions.update() to be called afterwards.

// Insert aliasing assertions
// Inline all function calls.
goto_function_inline(
goto_functions, function_obj->first, ns, log.get_message_handler());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above: call goto_functions.update()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Bugs or features of importance to AWS CBMC users aws-high bugfix Code Contracts Function and loop contracts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants