-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Description
As far as I can tell, slicing a Vec<T>
(in safe code) results in undefined behavior when T
is zero-sized or the Vec
has zero capacity. I'm probably missing something, but I'm creating this issue in case my investigation is correct.
In particular, these two examples appear to cause undefined behavior due the .offset()
call violating the first safety constraint ("Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object.") when performing the slice:
// Example 1: zero-sized T
let v = Vec::from(&[(); 5][..]);
let _ = &v[2..3];
// Example 2: zero-capacity Vec
let v = Vec::<i32>::with_capacity(0);
let _ = &v[0..0];
Example 1
In the first example, the v
has field values
Vec {
buf: RawVec {
ptr: Unique {
pointer: NonZero(0x1 as *const ()),
..
},
..
},
len: 5,
}
(Verify this with v.as_ptr()
and v.len()
.) Performing the slice &v[2..3]
expands to approximately the following:
let slice = unsafe {
let p = v.buf.ptr();
assume(!p.is_null());
slice::from_raw_parts(p, v.len)
};
// Note that the pointer of `slice` has value `0x1`.
// (Bounds checks elided here.)
unsafe {
from_raw_parts((slice as *const [()] as *const ()).offset(2), 3 - 2)
}
So, it's calling ptr.offset(2)
where ptr
has value 0x1
. This pointer is not "in bounds or one byte past the end of [an] allocated object", so the .offset()
is undefined behavior. (This pointer was created from casting an integer (the alignment of ()
) to a pointer in libcore/ptr.rs, Unique::empty
.)
Example 2
The second example has a similar issue. In the second example, the v
has field values
Vec {
buf: RawVec {
ptr: Unique {
pointer: NonZero(0x4 as *const i32),
..
},
..
},
len: 0,
}
(Verify this with v.as_ptr()
and v.len()
.) Performing the slice &v[0..0]
expands to approximately the following:
let slice = unsafe {
let p = v.buf.ptr();
assume(!p.is_null());
slice::from_raw_parts(p, v.len)
};
// Note that the pointer of `slice` has value `0x4`.
// (Bounds checks elided here.)
unsafe {
from_raw_parts((slice as *const [i32] as *const i32).offset(0), 0 - 0)
}
So, it's calling ptr.offset(0)
where ptr
has value 0x4
. This pointer is not "in bounds or one byte past the end of [an] allocated object", so the .offset()
is undefined behavior. (This pointer was created from casting an integer (the alignment of i32
) to a pointer in libcore/ptr.rs, Unique::empty
.)
Further investigation
There are a few ways that these examples might actually not be undefined behavior:
-
If the documentation is incorrect, and
.offset()
is in fact safe if the offset in bytes is zero (even if the pointer is not part of an allocated object). -
If LLVM considers
Unique::empty
to be an allocator so that the returned pointer is considered part of an allocated object. I don't see anything to indicate this is the case, though. -
If, somewhere, the runtime allocates the range of bytes with addresses
0x1..=(max possible alignment)
. This would mean that pointers returned byUnique::empty
would be within an allocated object. I don't see anything to indicate this is the case, though, and I'm not entirely convinced that casting an integer to a pointer would work in this case anyway (since the pointer would be derived from an integer instead of offsetting a pointer of an existing allocation).
I did some further investigation into possibility 1.
The .offset()
method is converted into an LLVM getelementptr inbounds
instruction. (src/libcore/ptr.rs
provides the .offset()
method, which calls intrinsics::offset
. src/libcore/intrinsics.rs
defines the extern "rust-intrinsic"
offset
but not the implementation. The codegen_intrinsic_call
function in src/librustc_codegen_llvm/intrinsic.rs
handles the "offset"
case by calling .inbounds_gep()
in the Builder
. The implementation of .inbounds_gep()
is provided in src/librustc_codegen_llvm/builder.rs
, which in turn calls the extern
function LLVMBuildInBoundsGEP
imported in src/librustc_llvm/ffi.rs
. The function is defined in src/llvm/include/llvm-c/Core.h
)
The docs for the LLVM getelementptr inbounds
instruction say the following:
If the
inbounds
keyword is present, the result value of thegetelementptr
is a poison value if the base pointer is not an in bounds address of an allocated object, or if any of the addresses that would be formed by successive addition of the offsets implied by the indices to the base address with infinitely precise signed arithmetic are not an in bounds address of that allocated object. The in bounds addresses for an allocated object are all the addresses that point into the object, plus the address one byte past the end. The only in bounds address for a null pointer in the default address-space is the null pointer itself. In cases where the base is a vector of pointers theinbounds
keyword applies to each of the computations element-wise.
The LLVM docs say this about poison values:
Poison values are similar to undef values, however they also represent the fact that an instruction or constant expression that cannot evoke side effects has nevertheless detected a condition that results in undefined behavior.
…
Poison values have the same behavior as undef values, with the additional effect that any instruction that has a dependence on a poison value has undefined behavior.
As far as I can tell, the reason why the Rust docs for .offset()
consider getting a "poison value" to be undefined behavior is that performing any operation with a dependence on the poison value (e.g. printing it with println!
) is undefined behavior. In particular, it's possible to perform operations with a dependence on a pointer value in safe code, so a pointer must never be a poison value.
Anyway, back to the safety constraints on .offset()
. The constraints listed in the docs for getelementptr inbounds
match the constraints listed in the docs for .offset()
with one exception: "The only in bounds address for a null pointer in the default address-space is the null pointer itself." This means that even though a null pointer is not part of an allocation, it's still safe to perform an offset of 0 bytes on it. The docs for getelementptr inbounds
don't indicate that this is true for non-null pointers, though, which is the case described in this issue (slicing a Vec
with zero-size elements or zero capacity).
Meta
This appears to be an issue in both stable (1.29.1) and nightly.