You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
See https://reviews.llvm.org/D86993: LLVM, and therefore Rust, assume that memcpy, memmove, memset and possibly other C standard library functions satisfy properties which are not required by the C standard. The least we can do is document this properly. However, I don't know where.
It's not just LLVM though, Rust itself also makes extra assumptions. We explicitly allow zero-sized accesses on pointers such as 42 as *const u8, and this includes zero-sized copy_nonoverlapping, copy and write_bytes. So basically we require zero-sized memcpy, memmove, memset to be a NOP. (Technically it could still be UB for NULL, OOB, or UAF pointers, but we might want to change this on the Rust side and aside from NULL it's not really possibly for implementations to exploit that.) This is justified by us emitting LLVM operations that explicitly say size 0 is a NOP -- but I am not sure what other backends are doing.
I think there's a separate question of whether these are assumptions of Rust, or of rustc.
In theory, this is just rustc's implementation of these stdlib functions/intrinsics/operations, and another stdlib/compiler, e.g. lccc, could be more correct in its assumptions in terms of what C promises. So the semantic question is whether unsafe code users can rely on memcpy(a,b,n*size_of_val_raw(a)) and core::ptr::copy_nonoverlapping(b,a,n) being exactly identical.
If these are indeed assumptions made by rustc, then they should be documented by rustc (and they can theoretically be anything rustc wants them to be).
(To be clear, I'm not necessarily saying that lccc would limit its assumptions and generate more conservative code for copy_nonoverlapping et. al, but it could - the assumptions rustc makes here are not necessarily fundamental to Rust)
Yeah - if it's just a rustc thing, then I don't think it should be T-opsem's job to instruct rustc (or any other compiler, for that matter) what assumptions it may assume about its environment, nor to document that what assumptions it should make - the documentation should probably exist, but IMO that's a job for T-compiler, not T-opsem.
The relation to t-opsem is that being able to make this assumption is a prerequisite for the operational semantics we want to use (even the one we already stably document). If people have major concerns with making this assumption, maybe we have to reconsider some of these choices?
But yeah mostly this is not t-opsem, but still fits the UCG I think.
Yeah this question confuses me, in what way does any of this impact opsem? These are C functions, not accessible from rust unless you use the libc crate. Rust has copy_nonoverlapping and this has fairly clear and obvious preconditions; I don't see how the C spec or the LLVM spec for memcpy impact the spec of rust at all.
The relation to t-opsem is that being able to make this assumption is a prerequisite for the operational semantics we want to use (even the one we already stably document).
Not really, it just means that if rustc doesn't make this assumption then it's codegen+stdlib impl of those semantics are wrong, as it was when it was assuming it could generate an empty infinite loop in llvm back when infinite loops weren't treated well by llvm.
But yeah mostly this is not t-opsem, but still fits the UCG I think.
I'm not really sure of this either, since we aren't directly exposing this to unsafe code, or user rust code in any way.
Hm I guess this is really mostly an LLVM implementation detail then. rustc doesn't even use memcpy/memmove/memset, it uses LLVM intrinsics; it is up to LLVM to implement them properly. I still feel we should document this assumption somewhere but still have no idea where...
If that is the consensus for this situation, there are some action items though:
also our own docs say things like "copy_nonoverlapping is semantically equivalent to C’s memcpy, but with the argument order swapped"; that should probably also be fixed then
Based on the reply I got in rust-lang/rust#113435, the Rust standard library is actually making such assumptions itself directly, not just via LLVM. This concerns the memcmp function, not just a language intrinsic. I don't think we want to special-case the standard library here -- either Miri will accept such pointers in memcmp or it won't.
So we need to either fix the standard library or find a suitable place to document this assumption.
As I mentioned, the rust standard library is privileged even if it isn't directly privileged, becaue it has the exhaustive list of targets it knows are supported, so this isn't necessarily an assumption that can be made by portable rust code.
In either case, the suitable place to document this, imo, is in a current implementation section on the stdlib's (core) documentation.
Activity
chorman0773 commentedon Jul 4, 2023
I think there's a separate question of whether these are assumptions of Rust, or of rustc.
In theory, this is just rustc's implementation of these stdlib functions/intrinsics/operations, and another stdlib/compiler, e.g.
lccc
, could be more correct in its assumptions in terms of what C promises. So the semantic question is whether unsafe code users can rely onmemcpy(a,b,n*size_of_val_raw(a))
andcore::ptr::copy_nonoverlapping(b,a,n)
being exactly identical.If these are indeed assumptions made by rustc, then they should be documented by rustc (and they can theoretically be anything rustc wants them to be).
chorman0773 commentedon Jul 4, 2023
(To be clear, I'm not necessarily saying that lccc would limit its assumptions and generate more conservative code for
copy_nonoverlapping
et. al, but it could - the assumptions rustc makes here are not necessarily fundamental to Rust)RalfJung commentedon Jul 4, 2023
Ah fair, I was thinking of rustc. I don't think we want to say anything about what happens when Rust user code calls the C functions.
chorman0773 commentedon Jul 4, 2023
Yeah - if it's just a rustc thing, then I don't think it should be T-opsem's job to instruct rustc (or any other compiler, for that matter) what assumptions it may assume about its environment, nor to document that what assumptions it should make - the documentation should probably exist, but IMO that's a job for T-compiler, not T-opsem.
RalfJung commentedon Jul 4, 2023
The relation to t-opsem is that being able to make this assumption is a prerequisite for the operational semantics we want to use (even the one we already stably document). If people have major concerns with making this assumption, maybe we have to reconsider some of these choices?
But yeah mostly this is not t-opsem, but still fits the UCG I think.
digama0 commentedon Jul 4, 2023
Yeah this question confuses me, in what way does any of this impact opsem? These are C functions, not accessible from rust unless you use the
libc
crate. Rust hascopy_nonoverlapping
and this has fairly clear and obvious preconditions; I don't see how the C spec or the LLVM spec formemcpy
impact the spec of rust at all.chorman0773 commentedon Jul 4, 2023
Not really, it just means that if rustc doesn't make this assumption then it's codegen+stdlib impl of those semantics are wrong, as it was when it was assuming it could generate an empty infinite loop in llvm back when infinite loops weren't treated well by llvm.
I'm not really sure of this either, since we aren't directly exposing this to unsafe code, or user rust code in any way.
RalfJung commentedon Jul 4, 2023
Hm I guess this is really mostly an LLVM implementation detail then. rustc doesn't even use
memcpy
/memmove
/memset
, it uses LLVM intrinsics; it is up to LLVM to implement them properly. I still feel we should document this assumption somewhere but still have no idea where...If that is the consensus for this situation, there are some action items though:
ptr::copy_nonoverlapping
is not memcpy rust-lang/rust#113347RalfJung commentedon Jul 5, 2023
Point 2 now has a PR at rust-lang/rust#113347, Point 1 is tracked in the Miri repo, so this can probably be closed.
RalfJung commentedon Jul 13, 2023
Based on the reply I got in rust-lang/rust#113435, the Rust standard library is actually making such assumptions itself directly, not just via LLVM. This concerns the
memcmp
function, not just a language intrinsic. I don't think we want to special-case the standard library here -- either Miri will accept such pointers inmemcmp
or it won't.So we need to either fix the standard library or find a suitable place to document this assumption.
chorman0773 commentedon Jul 13, 2023
As I mentioned, the rust standard library is privileged even if it isn't directly privileged, becaue it has the exhaustive list of targets it knows are supported, so this isn't necessarily an assumption that can be made by portable rust code.
In either case, the suitable place to document this, imo, is in a current implementation section on the stdlib's (
core
) documentation.5 remaining items