Skip to content

c2rust-analyze: initial implementation of error recovery #876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
May 22, 2023

Conversation

spernsteiner
Copy link
Collaborator

This branch adds a basic implementation of error recovery: if the static analysis panics on some function, it now catches the panic, marks that function as invalid, and continues with analysis of the remaining functions. At the end, after completing as much analysis and rewriting as it could, the tool prints a report of all the panics it encountered, including a stack trace for each one.

Currently, when marking a function as failed/invalid, we immediately mark all of its (transitive) callers invalid as well. In the future we can probably relax this, falling back on something like Trivial or on dynamic analysis results to let the callers reason about the unanalyzed function.

The current approach produces a lot of invalid Rust code. Suppose f calls g, f fails analysis, but g succeeds and gets rewritten. Currently we don't rewrite f at all due to the previous failure, and if g's signature gets rewritten, f's call to g will no longer be valid. In the future we'll need a better scheme for calling back and forth between analyzed and unanalyzed functions.

@spernsteiner spernsteiner requested review from kkysen and aneksteind and removed request for kkysen March 29, 2023 18:10
Copy link
Contributor

@kkysen kkysen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to change the existing feature-specific tests to check that there were no caught panics, rather than just not panicking. That could be in a follow-up PR, but I think it should merge right after this one so we're not left in an intermediate state where tests can silently fail.

@kkysen
Copy link
Contributor

kkysen commented Mar 29, 2023

In the future we can probably relax this, falling back on something like Trivial

What do you mean by this? I was planning on entirely removing Triviality (#874) as the current version is unsound and we were having trouble determining a sound version, and I was just going to hardcode the libc functions.

@spernsteiner
Copy link
Collaborator Author

We also need to change the existing feature-specific tests to check that there were no caught panics

Which tests are these? I'd prefer to fix it in this PR so master isn't put into an inconsistent state.

What do you mean by this? I was planning on entirely removing Trivial

I haven't fully thought this through, but I think we may want a way to approximate the behavior of a function g on its caller f even if g can't be fully analyzed. Otherwise, a failure in g would mean we can't analyze f at all. I'm not sure what soundness guarantees, if any, we can provide in the case where analysis has panicked on some functions; if we can't provide guarantees regardless, then this approximation wouldn't need to be sound.

@spernsteiner
Copy link
Collaborator Author

Panic reports now include a source span within the code being processed, if one is known. The output format looks like this:

panic at c2rust-analyze/src/context.rs:546:9: unexpected pointer type in *mut *mut _IO_FILE
related location: <rustc_middle::mir::syntax::Operand as c2rust_analyze::context::TypeOf>::type_of::h618b9ae9e6b3fd09 @ /.../c2rust-analyze/src/context.rs:537:41
source location: /.../lighttpd_rust_amalgamated//src/main.rs:91380:13: 91380:19 (#0)
   0: c2rust_analyze::panic_detail::panic_hook
             at src/panic_detail.rs:72:14
   1: core::ops::function::Fn::call
             at /rustc/d394408fb38c4de61f765a3ed5189d2731a1da91/library/core/src/ops/function.rs:77:5
   2: std::panicking::rust_panic_with_hook
             at /rustc/d394408fb38c4de61f765a3ed5189d2731a1da91/library/std/src/panicking.rs:702:17
   3: ...

This makes it easier to understand the context in which the panic occurs.


// Analysis of `bad2` should also fail because it has a callee on which analysis failed.
// CHECK-NOT: final labeling for "bad2"
unsafe fn bad2(p: NonNull<u8>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on adding a test asserting the functionality we know to be broken, e.g. that good2 defined before bad3 where bad3 calls good2, will not produce any of the error messages below? That way we can develop against this test case later on

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new bad_call_good function, which fails analysis and also calls good, and a check that this does not result in the failure propagating from bad_call_good to its callee good. But I'm not sure if that's what you were suggesting.

@kkysen
Copy link
Contributor

kkysen commented Apr 2, 2023

We also need to change the existing feature-specific tests to check that there were no caught panics

Which tests are these? I'd prefer to fix it in this PR so master isn't put into an inconsistent state.

Basically all of them. They all currently check that c2rust-analyze doesn't panic, but now the meaning of panicking changes, so those tests should be changed to check that c2rust-analyze has 0 caught panics.

self.fns_failed.insert(did, detail);

// This is the first time marking `did` as failed, so also mark all of its callers.
let callers = self.fn_callers.get(&did).cloned().unwrap_or(Vec::new());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let callers = self.fn_callers.get(&did).cloned().unwrap_or(Vec::new());
let callers = self.fn_callers.get(&did).cloned().unwrap_or_default();

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only removes a type hint that might be useful for the reader.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clippy::or_fun_call warning. I don't see a good reason not to follow it. The fact that it's Vec isn't important here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a false positive in Clippy. "This is only bad if it allocates or does some non-trivial amount of work," but "The lint also cannot figure out whether the function you call is actually expensive to call or not." In this case the function being called is const, so it doesn't need to do any run-time work at all. The lint in question has also been downgraded to allow-by-default due to the high rate of false positives.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a slightly different warning. When it suggests using a closure that calls a const fn, that is more of a false warning as you were saying, but in this case, .unwrap_or_default() is functionally different (I'm surprised it falls under the same clippy lint, as it's quite different). Using Default::default() for this is more semantically clear (we don't care that it's actually Vec::new(), but that it's the default, empty container). We could conceivably change it to ThinVec, Box<[T]>, IndexSet, etc., in which case the .unwrap_or_default() wouldn't need to be changed since it's the same semantic logic.

That said, I don't really want to spend time debating the merits of clippy lints. I think we should just do what clippy says here. There's no major reason not to, and using Default here is more semantically specific.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just do what clippy says here. There's no major reason not to

There's no good reason to make this change. The fact that Clippy suggested it isn't a good reason on its own, because Clippy often gives bad advice. (Any clippy lints that do regularly give good advice tend to get uplifted into rustc.)

The fact that it's Vec isn't important here.

I disagree. Knowing the type of xs is useful for knowing what you'll get if you do for x in xs.

As far as I know, the official policy for this repo does not require the code to be clippy-clean. Unless that changes (which I would oppose), I don't see any value in making the code (very slightly) worse just to appease the tool.

Copy link
Contributor

@kkysen kkysen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could 059628a and 86dfd30 be split out into a separate PR and merged immediately? Those 2 commits look good to me and are independent of the panicking stuff in the rest of this PR.

Comment on lines 9 to 16
/// Detailed information about a panic.
#[derive(Clone, Debug)]
pub struct PanicDetail {
msg: String,
loc: Option<String>,
relevant_loc: Option<String>,
backtrace: Option<Backtrace>,
sp: Span,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we separate PanicDetail (an actual panic occurred) and FnFailure (either a panic occurred or a called function failed)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we could separate the data itself with the formatting/displaying. Something more like this:

struct PanicDetail {
    msg: String,
    loc: String,
    backtrace: Backtrace,
    span: Span,
}

impl PanicDetail {
    pub fn from_info(info: &PanicInfo) -> Self {
        Self {
            msg: panic_to_string(info.payload()),
            loc: info.location().unwrap().to_string(),
            backtrace: Backtrace::new_unresolved(),
            span: CURRENT_SPAN.with(|cell| cell.get()),
        }
    }
}

and then impl Display on it, or wrappers. Right now the data and formatting are quite intertwined.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we separate PanicDetail (an actual panic occurred) and FnFailure (either a panic occurred or a called function failed)?

I think it's okay to leave this for now, since the "called function failed" will (hopefully) be going away some time soon.

we could separate the data itself with the formatting/displaying

I'm not entirely clear on what you're suggesting, but this also seems fine to me as-is. Such implementation details of the panic_detail module don't affect much outside the module itself, which is small and seems unlikely to need major changes in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these fields are optional, all except for msg. They are all set when an actual panic occurs, but when it's a function failing due to one its callees failing, only msg is used. I was thinking it could like this:

struct PanicDetail {
    msg: String,
    loc: String,
    backtrace: Backtrace,
    span: Span,
}

impl PanicDetail {
    pub fn from_info(info: &PanicInfo) -> Self {
        Self {
            msg: panic_to_string(info.payload()),
            loc: info.location().unwrap().to_string(),
            backtrace: Backtrace::new_unresolved(),
            span: CURRENT_SPAN.with(|cell| cell.get()),
        }
    }
}

enum FnFailure {
    Panic(PanicDetail),
    CalleeFailed(String),
}

Also, as I commented in #876 (comment), doing info.location().unwrap() simplifies things here more than just that one line, as then uses of PanicDetail::loc don't need to handle the Option.

I'm not sure what you mean by the "called function failed" will be going away soon, though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by the "called function failed" will be going away soon, though.

I'm currently working on better error recovery, so that we'll no longer need to propagate analysis failures up to callers.

let bt = Backtrace::new();
let detail = PanicDetail {
msg: panic_to_string(info.payload()),
loc: info.location().map(|l| l.to_string()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to do info.location().unwrap(). Currently, it always returns Some, and std .unwrap()s, too. That would simplify a bunch of the logic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change only simplifies this single line (loc still needs to be Option since PanicDetail::new doesn't provide a location), and it would make us reliant on an implementation detail of the standard library, so I don't think it's worth it.

Copy link
Contributor

@kkysen kkysen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also set up a snapshot test for this? For the caught panics report. I think that could be pushed to a secondary PR.

@spernsteiner spernsteiner force-pushed the analyze-recover branch 2 times, most recently from 580f136 to dd4a99b Compare May 3, 2023 20:58
@spernsteiner
Copy link
Collaborator Author

Rebased and addressed some final comments about panic-catching defaults in tests

Copy link
Contributor

@kkysen kkysen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were a few things I thought should be done a bit differently, but they're ultimately minor. The main parts of this PR—catching and correctly handling panics, and making sure our tests still test the same thing—all look good to me now.

@spernsteiner spernsteiner merged commit 0d6cd64 into master May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants