Skip to content

Document the assumptions we make about the C standard library, that go beyond what C requires #426

@RalfJung

Description

@RalfJung
Member

See https://reviews.llvm.org/D86993: LLVM, and therefore Rust, assume that memcpy, memmove, memset and possibly other C standard library functions satisfy properties which are not required by the C standard. The least we can do is document this properly. However, I don't know where.

It's not just LLVM though, Rust itself also makes extra assumptions. We explicitly allow zero-sized accesses on pointers such as 42 as *const u8, and this includes zero-sized copy_nonoverlapping, copy and write_bytes. So basically we require zero-sized memcpy, memmove, memset to be a NOP. (Technically it could still be UB for NULL, OOB, or UAF pointers, but we might want to change this on the Rust side and aside from NULL it's not really possibly for implementations to exploit that.) This is justified by us emitting LLVM operations that explicitly say size 0 is a NOP -- but I am not sure what other backends are doing.

Activity

chorman0773

chorman0773 commented on Jul 4, 2023

@chorman0773
Contributor

I think there's a separate question of whether these are assumptions of Rust, or of rustc.

In theory, this is just rustc's implementation of these stdlib functions/intrinsics/operations, and another stdlib/compiler, e.g. lccc, could be more correct in its assumptions in terms of what C promises. So the semantic question is whether unsafe code users can rely on memcpy(a,b,n*size_of_val_raw(a)) and core::ptr::copy_nonoverlapping(b,a,n) being exactly identical.

If these are indeed assumptions made by rustc, then they should be documented by rustc (and they can theoretically be anything rustc wants them to be).

chorman0773

chorman0773 commented on Jul 4, 2023

@chorman0773
Contributor

(To be clear, I'm not necessarily saying that lccc would limit its assumptions and generate more conservative code for copy_nonoverlapping et. al, but it could - the assumptions rustc makes here are not necessarily fundamental to Rust)

RalfJung

RalfJung commented on Jul 4, 2023

@RalfJung
MemberAuthor

Ah fair, I was thinking of rustc. I don't think we want to say anything about what happens when Rust user code calls the C functions.

chorman0773

chorman0773 commented on Jul 4, 2023

@chorman0773
Contributor

Yeah - if it's just a rustc thing, then I don't think it should be T-opsem's job to instruct rustc (or any other compiler, for that matter) what assumptions it may assume about its environment, nor to document that what assumptions it should make - the documentation should probably exist, but IMO that's a job for T-compiler, not T-opsem.

RalfJung

RalfJung commented on Jul 4, 2023

@RalfJung
MemberAuthor

The relation to t-opsem is that being able to make this assumption is a prerequisite for the operational semantics we want to use (even the one we already stably document). If people have major concerns with making this assumption, maybe we have to reconsider some of these choices?

But yeah mostly this is not t-opsem, but still fits the UCG I think.

digama0

digama0 commented on Jul 4, 2023

@digama0

Yeah this question confuses me, in what way does any of this impact opsem? These are C functions, not accessible from rust unless you use the libc crate. Rust has copy_nonoverlapping and this has fairly clear and obvious preconditions; I don't see how the C spec or the LLVM spec for memcpy impact the spec of rust at all.

chorman0773

chorman0773 commented on Jul 4, 2023

@chorman0773
Contributor

The relation to t-opsem is that being able to make this assumption is a prerequisite for the operational semantics we want to use (even the one we already stably document).

Not really, it just means that if rustc doesn't make this assumption then it's codegen+stdlib impl of those semantics are wrong, as it was when it was assuming it could generate an empty infinite loop in llvm back when infinite loops weren't treated well by llvm.

But yeah mostly this is not t-opsem, but still fits the UCG I think.

I'm not really sure of this either, since we aren't directly exposing this to unsafe code, or user rust code in any way.

RalfJung

RalfJung commented on Jul 4, 2023

@RalfJung
MemberAuthor

Hm I guess this is really mostly an LLVM implementation detail then. rustc doesn't even use memcpy/memmove/memset, it uses LLVM intrinsics; it is up to LLVM to implement them properly. I still feel we should document this assumption somewhere but still have no idea where...

If that is the consensus for this situation, there are some action items though:

RalfJung

RalfJung commented on Jul 5, 2023

@RalfJung
MemberAuthor

Point 2 now has a PR at rust-lang/rust#113347, Point 1 is tracked in the Miri repo, so this can probably be closed.

RalfJung

RalfJung commented on Jul 13, 2023

@RalfJung
MemberAuthor

Based on the reply I got in rust-lang/rust#113435, the Rust standard library is actually making such assumptions itself directly, not just via LLVM. This concerns the memcmp function, not just a language intrinsic. I don't think we want to special-case the standard library here -- either Miri will accept such pointers in memcmp or it won't.

So we need to either fix the standard library or find a suitable place to document this assumption.

chorman0773

chorman0773 commented on Jul 13, 2023

@chorman0773
Contributor

As I mentioned, the rust standard library is privileged even if it isn't directly privileged, becaue it has the exhaustive list of targets it knows are supported, so this isn't necessarily an assumption that can be made by portable rust code.

In either case, the suitable place to document this, imo, is in a current implementation section on the stdlib's (core) documentation.

5 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @RalfJung@digama0@chorman0773

      Issue actions

        Document the assumptions we make about the C standard library, that go beyond what C requires · Issue #426 · rust-lang/unsafe-code-guidelines