-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Description
The following program causes UB:
use std::os::windows::ffi::{OsStrExt, OsStringExt};
use std::ffi::{OsStr, OsString};
fn main() {
let base = "a\té \u{7f}💩\r";
let mut base: Vec<u16> = OsStr::new(base).encode_wide().collect();
base.push(0xD800);
let _res = OsString::from_wide(&base);
}
Miri says:
error: Undefined Behavior: type validation failed: encountered 0x0000d800, but expected a valid unicode codepoint
--> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/src/libcore/char/convert.rs:102:5
|
102 | transmute(i)
| ^^^^^^^^^^^^ type validation failed: encountered 0x0000d800, but expected a valid unicode codepoint
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: inside `std::char::from_u32_unchecked` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/src/libcore/char/convert.rs:102:5
= note: inside `std::sys_common::wtf8::Wtf8Buf::push_code_point_unchecked` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/src/libstd/sys_common/wtf8.rs:204:26
= note: inside `std::sys_common::wtf8::Wtf8Buf::from_wide` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/src/libstd/sys_common/wtf8.rs:194:21
= note: inside `<std::ffi::OsString as std::os::windows::ffi::OsStringExt>::from_wide` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/src/libstd/sys/windows/ext/ffi.rs:101:44
note: inside `main` at wtf8.rs:8:16
--> wtf8.rs:8:16
|
8 | let _res = OsString::from_wide(&base);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
The problem is this code:
rust/src/libstd/sys_common/wtf8.rs
Lines 293 to 305 in 96dd469
pub fn push(&mut self, code_point: CodePoint) { | |
if let trail @ 0xDC00..=0xDFFF = code_point.to_u32() { | |
if let Some(lead) = (&*self).final_lead_surrogate() { | |
let len_without_lead_surrogate = self.len() - 3; | |
self.bytes.truncate(len_without_lead_surrogate); | |
self.push_char(decode_surrogate_pair(lead, trail as u16)); | |
return; | |
} | |
} | |
// No newly paired surrogates at the boundary. | |
self.push_code_point_unchecked(code_point) | |
} |
This calls push_code_point_unchecked
unless the new code point is in 0xDC00..=0xDFFF
, but what about surrogates in 0xD800..0xDC00
?
This code is unchanged since its introduction in c5369eb. I am not sure what the intended safety contract of push_code_point_unchecked
is. That method is not marked unsafe
but clearly should be -- it calls char::from_u32_unchecked
. So my guess is the safety precondition is that CodePoint
must not be part of a surrogate pair, but the thing is, push
calls it without actually ensuring that condition. The condition it does ensure is that the codepoint is not in 0xDC00..=0xDFFF
, but that does not help.