Skip to content

Assertions that were never touched are reported as SUCCESS by CBMC #2684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
karkhaz opened this issue Aug 5, 2018 · 9 comments · Fixed by #3237
Closed

Assertions that were never touched are reported as SUCCESS by CBMC #2684

karkhaz opened this issue Aug 5, 2018 · 9 comments · Fixed by #3237

Comments

@karkhaz
Copy link
Collaborator

karkhaz commented Aug 5, 2018

Supose we do path exploration on this program

int main()
{
  int x;
  if(x)
    assert(0);
  else
    assert(0);
}

by running

cbmc  --paths lifo /tmp/foo.c

the output indicates that both paths were explored. However, in each case, the assertion on the other path is printed as being successful, even though it clearly fails, because that path was not the one currently under consideration:

...
** Results:
[main.assertion.1] assertion 0: SUCCESS
[main.assertion.2] assertion 0: FAILURE

** 1 of 2 failed (2 iterations)
VERIFICATION FAILED
___________________________
Starting new path (1 to go)
...
** Results:
[main.assertion.1] assertion 0: FAILURE
[main.assertion.2] assertion 0: SUCCESS

** 1 of 2 failed (2 iterations)
VERIFICATION FAILED

we should not print a spurious SUCCESS result for assertions that lie on a path that we're not currently exploring.

@martin-cs
Copy link
Collaborator

I disagree. BMC and symex are fundamentally underapproximate analyses, so FAILURE means "we know, for sure, that there is an execution traces that reaches this point and falsifies the condition", while SUCCESS is the converse; the analysis has not found a path that reaches it and falsifies it. In the example you give this is completely correct; if you are only exploring one branch then the other assertion is unreachable and so SUCCESS is correct. By increasing the search (more unwinding, more depth, more paths, whatever) you can change from SUCCESS to FAILURE but not vica-versa.

Now; should we make that a lot more clear -- probably yes. Should we move to a three-valued system of ERROR_TRACE_EXISTS, HAVE_NOT_EXPLORED_ALL_POSSIBILITES and NO_ERROR_TRACE_EXISTS -- maybe. But for now I don't think that suppressing outputs is a step in the right direction.

@karkhaz
Copy link
Collaborator Author

karkhaz commented Aug 6, 2018

Ok, perhaps "not print spurious SUCCESS" is too strong. I was actually thinking to replace it with UNKNOWN or UNREACHED or similar, so that SUCCESS (in the context of path exploration) means "we touched this assertion along this path and couldn't find a way to falsify it", as opposed to "we didn't reach this assertion at all and didn't even bother trying to falsify it".

OTOH, given that the --paths flag is opt-in, I feel that the user might consider the (many thousands of) UNKNOWN assertions on each path to be noise. When I've been running path exploration, I get annoyed by the zillions of irrelevant SUCCESS messages and I'm not sure that I'd care so much to see just as many UNKNOWN messages either.

Compromise suggestion: introduce three values, and have a flag to supress UNKNOWN? I don't have a strong opinion on whether the flag is default true or false, as long as it's there. How does that sound?

@martin-cs
Copy link
Collaborator

I'm tempted to approach this in two parts:

  1. Decide whether the results of the tools should be relative to the kind of analysis (as now) or whether they should be uniform across all tools. This is a serious policy question and one that will need proper consultation, it should not be changed through a PR to tool output.

  2. If we decide to change then we need a very clear, well documented set of outputs. I would start from the statement "The tool has found an execution trace that we know reaches the location and causes the assertion to fail", this breaks down into three parts:

*. "has found"
*. "known reachable"
*. "known assertion condition failure"

For example, goto-analyze when used in verify mode, will output four statuses:

SUCCESS : may be reachable, condition true in all of these cases
FAILURE (if reachable) : may be reachable, condition false in all of these cases
SUCCESS (unreachable) : definitely not reachable
UNKNOWN : may be reachable and condition may be false in some of these.

But note that different kinds of analysis may be able to give different answers and the difficulty of each answer for each kind of analysis may vary.

@karkhaz
Copy link
Collaborator Author

karkhaz commented Aug 6, 2018

That sounds very reasonable. I don't think I'm super-qualified to add much to the policy question, especially across the range of cprover tools that I'm not familiar with; I can chime in about path exploration if need be. What's the best way to proceed---unless there's a better forum to discuss this, would you ping everybody who should be involved in this discussion on this bug report?

Note that I don't think that this is a super-urgent question for e.g. SV-COMP, since the programatic return code of CBMC is basically correct, but I do agree that it's important not to confuse users. I'd be happy to rework the path exploration code and contribute documentation to reflect whatever we end up deciding.

@martin-cs
Copy link
Collaborator

Unless we wind up with most / everyone in the same place at the same time, this is probably the best place. To change 1. I think we'd need a set of people behind it including a large subset of @tautschnig @peterschrammel @smowton @chrisr-diffblue @kroening .

@tautschnig
Copy link
Collaborator

Just adding a note that SUCCESS may be spurious with CBMC in many cases when loop/recursion unwinding is incomplete, which is what @martin-cs has elaborated on above. I am thus changing the title of this issue to remove the restriction on path exploration.

@tautschnig tautschnig changed the title Assertions that were never touched are reported as SUCCESS when doing path exploration Assertions that were never touched are reported as SUCCESS by CBMC Aug 11, 2018
@peterschrammel
Copy link
Member

peterschrammel commented Aug 11, 2018

The current policy of CBMC (with BMC) is simple and transparent:

  • FAILURE: at least one of the assertions is violated by at least one input
  • SUCCESS: none of the assertions is violated by any input under the given under-approximations (restriction on loop unwindings etc)

The discussion with @karkhaz came up because with the paths option assertions appear as SUCCESS although they might be violated by some input. This is confusing because I am less aware of the under-approximations being made in this situation - i.e. it is not as easy to understand what the analysis has actually done. Without inspecting the symex coverage information I cannot decide whether SUCCESS means 'proven not to be violated ever', 'unknown because beyond my (deliberately) given underapproximations' or 'not analyzed because the path has not been explored by the algorithm'.

@karkhaz
Copy link
Collaborator Author

karkhaz commented Aug 11, 2018

I think I understand your position a lot better now! I think you and I have different mental models of what CBMC is trying to achieve when doing path exploration.

I think @peterschrammel views path exploration as "just another underaproximation," such that even though we haven't falsified an assertion yet (because we didn't explore its path yet, didn't unwind enough, or any other reason), we still might get a chance to falsify it in the future. In this view, CBMC is still analysing the whole program, but merely printing the results out one at a time.

To me, this suggests an error output different from what anyone has suggested so far: a cummulative one, where failed assertions discovered on previous paths are printed out along with any new failed assertions along this path. So for the following program:

int main()
{
  int x, y;
  if(x)
    if(y)
      assert(0);
    else
      assert(0);
  else
    if(y)
      assert(0);
    else
      assert(0);
}

the error output along each of the four paths would look like

assertion 1: FAILED
assertion 2: SUCCESS
assertion 3: SUCCESS
assertion 4: SUCCESS
...
assertion 1: FAILED
assertion 2: SUCCESS
assertion 3: FAILED
assertion 4: SUCCESS
...
assertion 1: FAILED
assertion 2: FAILED
assertion 3: FAILED
assertion 4: SUCCESS
...
assertion 1: FAILED
assertion 2: FAILED
assertion 3: FAILED
assertion 4: FAILED

My view was somewhat different: I considered CBMC's run along each path to be totally independent of any previous runs, as if we 'slice away' the rest of the program and only try to verify this path, for now. It's as if, on any particular path, we have a ghost assume(x == 1); assume(y == 0) or whatever at the beginning of the program, and we want to verify the program under those assumptions; and future paths would have different assumptions.

In that view, it's mildly astonishing when CBMC even prints out any other assertions, because I thought that CBMC was only symbolically executing the instructions that would be run given the initial assumptions. In fact, when I was writing the path exploration code, I thought that there was a bug in my code---I thought that CBMC was actually exploring those paths!

So: to me, the question is: are users' mental models more similar to @peterschrammel's or mine? I'd argue that @peterschrammel's is closer to what CBMC already does, but mine seems more intuitive given that the user explicitly passed in the --paths flag and expects CBMC to explore the instructions one path at a time. But whichever way, I think this is a question of "we shouldn't surprise the users, and we thus need to know what their expectation/mental model is".

@peterschrammel
Copy link
Member

peterschrammel commented Aug 11, 2018

a +cummulative+ one

Exactly.

expects CBMC to explore the instructions one path at a time.

The difficulty that I have as a user is that I have little visibility which path is being explored, but this information is crucial for correctly interpreting the output. I am not really interested whether it explores "one path at a time" or it is doing something else - the only thing I care about is the status of the assertions, which should be easily interpretable and not depend on how the algorithm works.

karkhaz added a commit to karkhaz/cbmc that referenced this issue Oct 28, 2018
CBMC will no longer bother running safety checks on path segments that
did not generate any new VCCs when running in path-exploration mode.

A "path segment" is a list of instructions between two program branch
points. CBMC would previously pause symbolic execution and run safety
checks at every branch point, which is wasteful when no additional VCCs
have been generated since the last pause. As of this commit, CBMC will
check to see if new VCCs have been added, and only do the safety check
if so.

To support this, the new member goto_symext::path_segment_vccs records
the number of VCCs that were newly generated along this path segment. It
differs from the VCC counters in goto_symex_statet, which hold the
number of VCCs generated along the current path since the start of the
program. goto_symext objects are minted for each path segment, so they
don't hold any information about the VCCs previously along the path.

This commit also removes the "Starting new path" output, since this no
longer makes much sense. On the other hand, there is an additional final
status output; CBMC will report failure after the entire program has
been explored if even a single path contained a failing assertion, and
report success otherwise.

The unit tests have been updated to reflect the fact that CBMC no longer
emits "SUCCESS" at each branch point and at the end of paths, and also
that CBMC makes one final report of the program safety as a whole.

This commit fixes diffblue#2684.
karkhaz added a commit to karkhaz/cbmc that referenced this issue Oct 28, 2018
CBMC will no longer bother running safety checks on path segments that
did not generate any new VCCs when running in path-exploration mode.

A "path segment" is a list of instructions between two program branch
points. CBMC would previously pause symbolic execution and run safety
checks at every branch point, which is wasteful when no additional VCCs
have been generated since the last pause. As of this commit, CBMC will
check to see if new VCCs have been added, and only do the safety check
if so.

To support this, the new member goto_symext::path_segment_vccs records
the number of VCCs that were newly generated along this path segment. It
differs from the VCC counters in goto_symex_statet, which hold the
number of VCCs generated along the current path since the start of the
program. goto_symext objects are minted for each path segment, so they
don't hold any information about the VCCs previously along the path.

This commit also removes the "Starting new path" output, since this no
longer makes much sense. On the other hand, there is an additional final
status output; CBMC will report failure after the entire program has
been explored if even a single path contained a failing assertion, and
report success otherwise.

The unit tests have been updated to reflect the fact that CBMC no longer
emits "SUCCESS" at each branch point and at the end of paths, and also
that CBMC makes one final report of the program safety as a whole.

This commit fixes diffblue#2684.
karkhaz added a commit to karkhaz/cbmc that referenced this issue Oct 29, 2018
CBMC will no longer bother running safety checks on path segments that
did not generate any new VCCs when running in path-exploration mode.

A "path segment" is a list of instructions between two program branch
points. CBMC would previously pause symbolic execution and run safety
checks at every branch point, which is wasteful when no additional VCCs
have been generated since the last pause. As of this commit, CBMC will
check to see if new VCCs have been added, and only do the safety check
if so.

To support this, the new member goto_symext::path_segment_vccs records
the number of VCCs that were newly generated along this path segment. It
differs from the VCC counters in goto_symex_statet, which hold the
number of VCCs generated along the current path since the start of the
program. goto_symext objects are minted for each path segment, so they
don't hold any information about the VCCs previously along the path.

This commit also removes the "Starting new path" output, since this no
longer makes much sense. On the other hand, there is an additional final
status output; CBMC will report failure after the entire program has
been explored if even a single path contained a failing assertion, and
report success otherwise.

The unit tests have been updated to reflect the fact that CBMC no longer
emits "SUCCESS" at each branch point and at the end of paths, and also
that CBMC makes one final report of the program safety as a whole.

This commit fixes diffblue#2684.
karkhaz added a commit to karkhaz/cbmc that referenced this issue Oct 30, 2018
CBMC will no longer bother running safety checks on path segments that
did not generate any new VCCs when running in path-exploration mode.

A "path segment" is a list of instructions between two program branch
points. CBMC would previously pause symbolic execution and run safety
checks at every branch point, which is wasteful when no additional VCCs
have been generated since the last pause. As of this commit, CBMC will
check to see if new VCCs have been added, and only do the safety check
if so.

To support this, the new member goto_symext::path_segment_vccs records
the number of VCCs that were newly generated along this path segment. It
differs from the VCC counters in goto_symex_statet, which hold the
number of VCCs generated along the current path since the start of the
program. goto_symext objects are minted for each path segment, so they
don't hold any information about the VCCs previously along the path.

This commit also removes the "Starting new path" output, since this no
longer makes much sense. On the other hand, there is an additional final
status output; CBMC will report failure after the entire program has
been explored if even a single path contained a failing assertion, and
report success otherwise.

The unit tests have been updated to reflect the fact that CBMC no longer
emits "SUCCESS" at each branch point and at the end of paths, and also
that CBMC makes one final report of the program safety as a whole.

This commit fixes diffblue#2684.
xbauch pushed a commit to xbauch/cbmc that referenced this issue Nov 9, 2018
CBMC will no longer bother running safety checks on path segments that
did not generate any new VCCs when running in path-exploration mode.

A "path segment" is a list of instructions between two program branch
points. CBMC would previously pause symbolic execution and run safety
checks at every branch point, which is wasteful when no additional VCCs
have been generated since the last pause. As of this commit, CBMC will
check to see if new VCCs have been added, and only do the safety check
if so.

To support this, the new member goto_symext::path_segment_vccs records
the number of VCCs that were newly generated along this path segment. It
differs from the VCC counters in goto_symex_statet, which hold the
number of VCCs generated along the current path since the start of the
program. goto_symext objects are minted for each path segment, so they
don't hold any information about the VCCs previously along the path.

This commit also removes the "Starting new path" output, since this no
longer makes much sense. On the other hand, there is an additional final
status output; CBMC will report failure after the entire program has
been explored if even a single path contained a failing assertion, and
report success otherwise.

The unit tests have been updated to reflect the fact that CBMC no longer
emits "SUCCESS" at each branch point and at the end of paths, and also
that CBMC makes one final report of the program safety as a whole.

This commit fixes diffblue#2684.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants