Skip to content

SIGILLs when panicking #48088

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MageSlayer opened this issue Feb 9, 2018 · 10 comments
Closed

SIGILLs when panicking #48088

MageSlayer opened this issue Feb 9, 2018 · 10 comments
Labels
A-FFI Area: Foreign function interface (FFI)

Comments

@MageSlayer
Copy link

MageSlayer commented Feb 9, 2018

Hi

I am trying to debug some regression in compiler leading to triggering SIGILLs when raising exceptions/panicking. More precisely, SIGILLs happen at unwinding stage after drop_in_place calls in frame cleaning. Technically it's seen as a jump to ud2 instruction.

Quick bisecting using precompiled binaries resulted in finding nightly-2017-12-25-x86_64-unknown-linux-gnu as the first culprit. nightly-2017-12-24-x86_64-unknown-linux-gnu still works fine.

Unfortunately, I cannot reproduce it in easy test-case.
I am under Linux x64, no optimization/debug build.

I'll try to bisect using git, but recompiling compiler is really slow, so results are not coming quickly.

@retep998
Copy link
Member

retep998 commented Feb 9, 2018

If you're getting a SIGILL because it executed a ud2, it is probably something deciding to call intrinsics::abort(), which can be caused by a few things, one of which is panicking while unwinding from another panic.

@MageSlayer
Copy link
Author

Yes, I know that.
I am trying to get the reason for that. And why it appeared in Dec25 build.

Moreover, it's not a user-code but exception frame clean-ups ("landing pads" or whatever it's called), so I have a strong suspicion that something changed/regressed in compiler.

@MageSlayer
Copy link
Author

@alexcrichton
Do you happen to know how to determine which commits went to pre-built binaries on specific date?
It would reduce number of iterations for git bisect.

@nagisa
Copy link
Member

nagisa commented Feb 9, 2018

And why it appeared in Dec25 build.

Sounds a lot like #46833.

@MageSlayer
Copy link
Author

MageSlayer commented Feb 9, 2018

@nagisa @diwic @arielb1
Yes. It's definitely looks like those changes. And that's scary.
I have several questions (maybe you know the answers or somebody to invite for discussion):

  1. Why is extern "[any calling convention]" now regarded as FFI boundary? I had quite a measurable speed-up changing hot functions into "fastcall"/"vectorcall".
  2. Maybe it's possible avoid marking them external and still have them use fastcall/vectorcall + proper unwind? Adding #[unwind] does not seem to help...
  3. What about forward references for Rust functions declared as "extern" for some other crate to satisfy them? It's all Rust, no FFI, but now just mentioning "extern" on them leads to unwind disable? That's definitively not good..
  4. Why compiler does not warn about using unwind function inside "now-non-unwinding" function?

EDIT:
Looks like changing fastcall/vectorcall into Rust calling convention using extern "Rust" brings unwinding back. My impression is that #[unwind] attribute functionality overlaps with "extern [non-Rust]" construction.

@nagisa
Copy link
Member

nagisa commented Feb 9, 2018

Why is extern "[any calling convention]" now regarded as FFI boundary? I had quite a measurable speed-up changing hot functions into "fastcall"/"vectorcall".

extern "foo" where foo is not either Rust or rust-call, was never specified to not be a FFI bounary. We always informed LLVM that these "FFI boundary" functions were nounwind before, so unwinding from these was always undefined. See also this comment onwards.

Maybe it's possible avoid marking them external and still have them use fastcall/vectorcall + proper unwind? Adding #[unwind] does not seem to help...

#[unwind] not helping seems like a bug that should be reported separately. Maybe you’re unwinding past some other function that you forgot to annotate with #[unwind]? Can you minimise the test-case?

#[unwind] should definitely disable the trap, remove the nounwind annotation in generated code etc etc.

What about forward references for Rust functions declared as "extern" for some other crate to satisfy them? It's all Rust, no FFI, but now just mentioning "extern" on them leads to unwind disable? That's definitively not good..

Those should work just fine, as extern "Rust" and extern "rust-call" are still not considered a "FFI boundary".

Why compiler does not warn about using unwind function inside "now-non-unwinding" function?

Such lint would definitely be very false-positive happy, so it has no place in the compiler.

@MageSlayer
Copy link
Author

MageSlayer commented Feb 9, 2018

Ok, I got it.

Still #[unwind] does not work. What I did is:
forward declaration

#[allow(improper_ctypes)]
extern "fastcall" {
    #[unwind]
    pub fn eval(ast: &AST, interpreter: &Interpreter) -> AST;
 ...

and actual function definition

#[no_mangle]
#[unwind]
pub extern "fastcall" fn eval(ast: &AST, interpreter: &Interpreter) -> AST {
...
}

... for each function marked as "extern" and it still triggers UB on latest nightly.
I'll try to make a small example.

@diwic
Copy link
Contributor

diwic commented Feb 9, 2018

Looking at MSDN's docs for vectorcall - which I assume is what "vectorcall" refers to - it does not mention how to handle exceptions, stack unwinding, etc. So I guess it's fair to say that exceptions/panics are not supported in this calling convention.

But if vectorcall is actually faster than rust-call, then we should create a new rust-vectorcall convention, which does support panics and uses registers/stack just as MS vectorcall does. (And then we could try switching to it by default, if the majority of calls are quicker with that.)

@retep998
Copy link
Member

retep998 commented Feb 9, 2018

At least for Windows, exceptions and stack unwinding are handled in a standardized way using SEH, and independent of the calling convention used. The only thing that gets in the way of unwinding across those function boundaries is the fact that that Rust emits the nounwind attribute along with code to catch an unwind if it does occur and abort. Unless you're actually doing FFI in which case the other language has to be equipped to handle unwinding using the same system.

Because the rust-call and Rust calling conventions do not have any ABI stability, we can easily change them to behave more like vectorcall if it was found to be beneficial in general.

@pietroalbini pietroalbini added the A-FFI Area: Foreign function interface (FFI) label Feb 10, 2018
@pietroalbini
Copy link
Member

Thanks for reporting this! We already knew about this problem, but it was decided to keep sending SIGILLs when a panic occurs across extern functions. Closing this issue as expected behavior.

Please open another issue about #[unwind] though, so we can investigate it!

RReverser pushed a commit to cloudflare/wirefilter that referenced this issue Feb 20, 2019

Verified

This commit was signed with the committer’s verified signature.
parlough Parker Lougheed
RReverser pushed a commit to cloudflare/wirefilter that referenced this issue Feb 20, 2019

Verified

This commit was signed with the committer’s verified signature.
parlough Parker Lougheed
RReverser pushed a commit to cloudflare/wirefilter that referenced this issue Feb 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-FFI Area: Foreign function interface (FFI)
Projects
None yet
Development

No branches or pull requests

5 participants