Skip to content

(Maybe) Undefined behavior in safe code from getelementptr inbounds with offset 0 #54857

@jturner314

Description

@jturner314

As far as I can tell, slicing a Vec<T> (in safe code) results in undefined behavior when T is zero-sized or the Vec has zero capacity. I'm probably missing something, but I'm creating this issue in case my investigation is correct.

In particular, these two examples appear to cause undefined behavior due the .offset() call violating the first safety constraint ("Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object.") when performing the slice:

// Example 1: zero-sized T
let v = Vec::from(&[(); 5][..]);
let _ = &v[2..3];

// Example 2: zero-capacity Vec
let v = Vec::<i32>::with_capacity(0);
let _ = &v[0..0];

Example 1

In the first example, the v has field values

Vec {
    buf: RawVec {
        ptr: Unique {
            pointer: NonZero(0x1 as *const ()),
            ..
        },
        ..
    },
    len: 5,
}

(Verify this with v.as_ptr() and v.len().) Performing the slice &v[2..3] expands to approximately the following:

let slice = unsafe {
    let p = v.buf.ptr();
    assume(!p.is_null());
    slice::from_raw_parts(p, v.len)
};
// Note that the pointer of `slice` has value `0x1`.
// (Bounds checks elided here.)
unsafe {
    from_raw_parts((slice as *const [()] as *const ()).offset(2), 3 - 2)
}

So, it's calling ptr.offset(2) where ptr has value 0x1. This pointer is not "in bounds or one byte past the end of [an] allocated object", so the .offset() is undefined behavior. (This pointer was created from casting an integer (the alignment of ()) to a pointer in libcore/ptr.rs, Unique::empty.)

Example 2

The second example has a similar issue. In the second example, the v has field values

Vec {
    buf: RawVec {
        ptr: Unique {
            pointer: NonZero(0x4 as *const i32),
            ..
        },
        ..
    },
    len: 0,
}

(Verify this with v.as_ptr() and v.len().) Performing the slice &v[0..0] expands to approximately the following:

let slice = unsafe {
    let p = v.buf.ptr();
    assume(!p.is_null());
    slice::from_raw_parts(p, v.len)
};
// Note that the pointer of `slice` has value `0x4`.
// (Bounds checks elided here.)
unsafe {
    from_raw_parts((slice as *const [i32] as *const i32).offset(0), 0 - 0)
}

So, it's calling ptr.offset(0) where ptr has value 0x4. This pointer is not "in bounds or one byte past the end of [an] allocated object", so the .offset() is undefined behavior. (This pointer was created from casting an integer (the alignment of i32) to a pointer in libcore/ptr.rs, Unique::empty.)

Further investigation

There are a few ways that these examples might actually not be undefined behavior:

  1. If the documentation is incorrect, and .offset() is in fact safe if the offset in bytes is zero (even if the pointer is not part of an allocated object).

  2. If LLVM considers Unique::empty to be an allocator so that the returned pointer is considered part of an allocated object. I don't see anything to indicate this is the case, though.

  3. If, somewhere, the runtime allocates the range of bytes with addresses 0x1..=(max possible alignment). This would mean that pointers returned by Unique::empty would be within an allocated object. I don't see anything to indicate this is the case, though, and I'm not entirely convinced that casting an integer to a pointer would work in this case anyway (since the pointer would be derived from an integer instead of offsetting a pointer of an existing allocation).

I did some further investigation into possibility 1.

The .offset() method is converted into an LLVM getelementptr inbounds instruction. (src/libcore/ptr.rs provides the .offset() method, which calls intrinsics::offset. src/libcore/intrinsics.rs defines the extern "rust-intrinsic" offset but not the implementation. The codegen_intrinsic_call function in src/librustc_codegen_llvm/intrinsic.rs handles the "offset" case by calling .inbounds_gep() in the Builder. The implementation of .inbounds_gep() is provided in src/librustc_codegen_llvm/builder.rs, which in turn calls the extern function LLVMBuildInBoundsGEP imported in src/librustc_llvm/ffi.rs. The function is defined in src/llvm/include/llvm-c/Core.h)

The docs for the LLVM getelementptr inbounds instruction say the following:

If the inbounds keyword is present, the result value of the getelementptr is a poison value if the base pointer is not an in bounds address of an allocated object, or if any of the addresses that would be formed by successive addition of the offsets implied by the indices to the base address with infinitely precise signed arithmetic are not an in bounds address of that allocated object. The in bounds addresses for an allocated object are all the addresses that point into the object, plus the address one byte past the end. The only in bounds address for a null pointer in the default address-space is the null pointer itself. In cases where the base is a vector of pointers the inbounds keyword applies to each of the computations element-wise.

The LLVM docs say this about poison values:

Poison values are similar to undef values, however they also represent the fact that an instruction or constant expression that cannot evoke side effects has nevertheless detected a condition that results in undefined behavior.

Poison values have the same behavior as undef values, with the additional effect that any instruction that has a dependence on a poison value has undefined behavior.

As far as I can tell, the reason why the Rust docs for .offset() consider getting a "poison value" to be undefined behavior is that performing any operation with a dependence on the poison value (e.g. printing it with println!) is undefined behavior. In particular, it's possible to perform operations with a dependence on a pointer value in safe code, so a pointer must never be a poison value.

Anyway, back to the safety constraints on .offset(). The constraints listed in the docs for getelementptr inbounds match the constraints listed in the docs for .offset() with one exception: "The only in bounds address for a null pointer in the default address-space is the null pointer itself." This means that even though a null pointer is not part of an allocation, it's still safe to perform an offset of 0 bytes on it. The docs for getelementptr inbounds don't indicate that this is true for non-null pointers, though, which is the case described in this issue (slicing a Vec with zero-size elements or zero capacity).

Meta

This appears to be an issue in both stable (1.29.1) and nightly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    I-unsoundIssue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/SoundnessT-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions