Skip to content

SIGILL while running the test suite of a few crates #47616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pietroalbini opened this issue Jan 20, 2018 · 20 comments
Closed

SIGILL while running the test suite of a few crates #47616

pietroalbini opened this issue Jan 20, 2018 · 20 comments
Labels
regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-lang Relevant to the language team
Milestone

Comments

@pietroalbini
Copy link
Member

pietroalbini commented Jan 20, 2018

The test suites of a few crates are crashing with a SIGILL in the latest beta.

cc @PJB3005

@pietroalbini pietroalbini added the regression-from-stable-to-beta Performance or correctness regression from stable to beta. label Jan 20, 2018
@pietroalbini pietroalbini added this to the 1.24 milestone Jan 20, 2018
@pietroalbini pietroalbini changed the title Tests of byond fail with a SIGILL SIGILL while running the test suite of a few crates Jan 20, 2018
@pietroalbini
Copy link
Member Author

Crate consistenttime also regressed (crater log).

cc @valarauca

@PJB3005
Copy link

PJB3005 commented Jan 20, 2018

It might be related to my crate not being thread safe. This is because the library is intended to be invoked from a proprietary C++ program (BYOND) using FFI, and said program isn't threaded at all, so I saw no need to make it thread safe and just used a mutable static.

I don't test the crate directly, but a library I made using the crate does and I did both RUST_TEST_THREADS=1 and --jobs 1 on the cargo test job for it: https://github.com/vgstation-coders/vgstation13/blob/2edf31f9d0aeea4045240571090156a8a6c88254/.travis.yml#L43

@pietroalbini
Copy link
Member Author

I tested your crate again with just one thread and it still crashes. Can you please manually test it?

Be sure to use the beta channel though: the error doesn't appear on stable (only beta and nightly), and the travis configuration you linked just tests on stable.

@PJB3005
Copy link

PJB3005 commented Jan 20, 2018

Ok yeah I can definitely confirm it's crashing due to a SIGILL on beta. How strange...

@PJB3005
Copy link

PJB3005 commented Jan 20, 2018

Ok I figured it out. Seems like panicking across an extern "C" causes the SIGILL. Example code causing it:

extern "C" fn foo() {
    panic!("uh oh");
}

#[test]
fn does_it_break() {
    foo()
}

Now whether or not this should be legal I'm not sure of, so if this was just an implementation detail I'll just remove the bad tests.

On top of this, looking at the other crate mentioned here, it definitely seems like that one also has tests checking a panic across an extern "C", so that's the same issue probably.

@durka
Copy link
Contributor

durka commented Jan 21, 2018

Panicking across FFI boundaries is definitely undefined behavior.

@PJB3005
Copy link

PJB3005 commented Jan 21, 2018

Yeah I don't doubt it, but on the other hand none of this is explicitly marked unsafe so I don't know how that fits in here.

@sfackler
Copy link
Member

This is caused by #46833 protecting from undefined behavior by aborting instead.

@durka
Copy link
Contributor

durka commented Jan 21, 2018

Huh... I would've thought you'd need unsafe to call any extern "C" function.

@bluss
Copy link
Member

bluss commented Jan 21, 2018

No reason to worry — abort is violent but memory safe. And it is now mentioned in the reference rust-lang/reference#196

@nagisa
Copy link
Member

nagisa commented Jan 21, 2018

extern "C" only specifies the calling convention to use¹. I’m not sure we want to call that a "FFI boundary", as both caller and callee are (safe) Rust code, but it is also something that was never specified to not be a "FFI boundary", so breakage seems fine.

[¹]: using a different calling convention does not make the code/function unsafe.

@pietroalbini
Copy link
Member Author

Crate mujs is also affected. cc @hean01

@pietroalbini
Copy link
Member Author

Crate seckey is also affected. cc @quininer

@Mark-Simulacrum Mark-Simulacrum added the T-lang Relevant to the language team label Jan 31, 2018
@Mark-Simulacrum
Copy link
Member

I believe this is expected -- @rust-lang/lang can you confirm that we are okay with this regression? To be clear, #46833 made it so that we do not unwind across FFI boundaries, and instead abort. Currently inclined to close as these crates were depending on UB.

@joshtriplett
Copy link
Member

How hard would it be, in debug mode, to provide more detailed panic information in this case rather than just an abort?

@PJB3005
Copy link

PJB3005 commented Jan 31, 2018

FWIW, I'm fine with this being "regressed".

@Mark-Simulacrum
Copy link
Member

@joshtriplett @diwic or @nikomatsakis can probably comment on providing more information in debug mode. I think that sounds like something we could improve and add to... seems simple in theory, but may have implications I'm not aware of. There's some comments on #46833 I think.

@nikomatsakis
Copy link
Contributor

This is expected behavior. The idea roughly is that specifying the ABI does indeed specify whether a panic is allowed -- it's not so much about the FFI boundary, as about the fact taht extern "C" functions don't have a defined way to propagate unwinding (particularly since our panic mechanism is an undefined impl detail, and hence specific to the Rust ABI).

Obviously it'd be nice to give a better error message.

@nikomatsakis
Copy link
Contributor

Closing as "expected behavior" in this case =)

@diwic
Copy link
Contributor

diwic commented Feb 1, 2018

The easy answer is that the extern C ABI does not support unwinding and therefore it must be stopped. But that is not entirely accurate, because we do have a few extern C function that can start unwinding. These are marked with the #[unwind] attribute (this attribute is only available on nightly). If the unwind attribute is not present on an extern C function (and some other ABIs as well, such as e g stdcall), we tell LLVM that the function is a "nounwind" function, and unwinding from a function that is "nounwind" - that is, my friends, undefined behaviour.

Now, we could have just told LLVM that no functions are "nounwind". But - apart from possible negative impact on optimisations and/or code size - that would be a against the practical goal here, which is to avoid as many panics into FFI land as possible, where the panic could cause all sorts of unexpected problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-lang Relevant to the language team
Projects
None yet
Development

No branches or pull requests

10 participants