Skip to content

Rust 1.90 compiler optimization (O2, O3) triggers UB with Box::from_raw() #147265

@jschwe

Description

@jschwe

Code

I minimized the issue to the following testcase, which uses serde and bincode to serialize a structure by boxing it, serializing the exposed address and reconstructs the box from the deserialized address.
I believe this to be valid, as long as one can uphold the condition that you don't deserialize the same struct twice. However, the testcase below fails due to a manifestation of UB.

use serde::de::Error;
use serde::{Deserialize, Deserializer, Serialize, Serializer};
use std::hint::black_box;
use std::marker::PhantomData;
use std::sync::Arc;
use std::{fmt, ptr};

struct MySender<T> {
    inner: Arc<T>,
}

impl<T> Clone for MySender<T> {
    fn clone(&self) -> Self {
        Self {
            inner: self.inner.clone(),
        }
    }
}

impl<T> MySender<T>
where
    T: Serialize,
{
    fn send(&self, value: T) -> Result<(), ()> {
        black_box(value);
        Ok(())
    }
}

impl<T> Serialize for MySender<T>
where
    T: Serialize,
{
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        // We know everything is in one address-space, so we can "serialize" the sender by
        // sending a leaked Box pointer.
        let leaked_sender = Box::into_raw(Box::new(self.clone()));
        let sender_clone_addr: usize = leaked_sender.expose_provenance();
        println!("Serialized addr is {:X}", sender_clone_addr);
        serializer.serialize_newtype_struct("MySender", &sender_clone_addr)
    }
}

struct MySenderVisitor<T> {
    marker: PhantomData<T>,
}

impl<'de, T: Serialize + Deserialize<'de>> serde::de::Visitor<'de> for MySenderVisitor<T> {
    type Value = MySender<T>;

    fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        formatter.write_str("a MySender")
    }

    // ---------
    // Variant 2: deserialize usize: Testcase passes
    // ---------

    fn visit_newtype_struct<D>(self, deserializer: D) -> Result<Self::Value, D::Error>
    where
        D: Deserializer<'de>,
    {
        // Newtype structs transparently serialize to the inner type.
        let addr = usize::deserialize(deserializer)?;
        let is_aligned = addr % align_of::<Self::Value>() == 0;
        // Adding these checks here, adds branches, which causes the UB to manifest more
        // visibly. We can also remove / comment the checks here, and the Box::from_raw will fail.
        if addr == 0 {
            return Err(D::Error::custom("address is zero"));
        } else if !is_aligned {
            // Result: "addr 103791B10 is not aligned to 8. is_aligned: true"
            // The above line is only possible if we have UB (and the condition check is optimized away).
            let msg = format!(
                "addr {addr:X} is not aligned to {}. is_aligned: {is_aligned}",
                align_of::<Self::Value>()
            );
            return Err(D::Error::custom(msg));
        }
        let ptr: *mut Self::Value = ptr::with_exposed_provenance_mut(addr);
        assert!(!ptr.is_null());
        let reconstructed = unsafe { Box::from_raw(ptr) };
        Ok(*reconstructed)
    }

    // ---------
    // Variant 2: deserialize u64 and cast to usize: Testcase passes
    // ---------

    // fn visit_newtype_struct<D>(self, deserializer: D) -> Result<Self::Value, D::Error>
    // where
    //     D: Deserializer<'de>,
    // {
    //     deserializer.deserialize_u64(self)
    // }
    //
    // fn visit_u64<E>(self, v: u64) -> Result<Self::Value, E>
    // where
    //     E: Error,
    // {
    //     let addr = v as usize;
    //     let ptr: *mut Self::Value = ptr::with_exposed_provenance_mut(addr);
    //     assert!(!ptr.is_null());
    //     let reconstructed = unsafe { Box::from_raw(ptr) };
    //     Ok(*reconstructed)
    // }
}

impl<'a, T: Serialize + Deserialize<'a>> Deserialize<'a> for MySender<T> {
    fn deserialize<D>(d: D) -> Result<MySender<T>, D::Error>
    where
        D: Deserializer<'a>,
    {
        d.deserialize_newtype_struct(
            "MySender",
            MySenderVisitor {
                marker: PhantomData,
            },
        )
    }
}

#[cfg(test)]
mod single_process_channel_tests {
    use std::sync::Arc;

    use crate::MySender;

    // This test works / does not show signs of UB
    #[test]
    fn serialize_roundtrip_bincode2() {
        let generic_sender = MySender {
            inner: Arc::new(42),
        };
        let config = bincode2::config::legacy();
        let data = bincode2::serde::encode_to_vec(&generic_sender, config.clone()).unwrap();
        eprintln!("Serialized: {data:?} - len: {}", data.len());

        let (reconstructed, _len): (MySender<u64>, _) =
            bincode2::serde::decode_from_slice(&data, config).unwrap();
        reconstructed.send(42).unwrap();
    }

    // This test case manifests UB
    #[test]
    fn serialize_roundtrip_bincode() {
        let generic_sender: MySender<u64> = MySender {
            inner: Arc::new(42_u64),
        };
        let mut data = Vec::with_capacity(1024);
        bincode::serialize_into(&mut data, &generic_sender).expect("Serialization failed");
        eprintln!("Serialized: {data:?}",);

        let reconstructed: MySender<u64> =
            bincode::deserialize(&data).expect("Deserialization failed");

        reconstructed.send(42_u64).unwrap();
    }
}

Cargo.toml:

[package]
name = "test_ub"
version = "0.1.0"
edition = "2024"

[lib]
name = "test_ub"

[dependencies]
serde = { version = "1.0.225", features = ["derive"] }
bincode2 = {package = "bincode", version = "2.0.1", features = ["serde"]}
bincode = "1.3.3"

[profile.release]
debug-assertions = true
opt-level = 3
debug = true

I expected to see this happen: The test cases pass in release mode (with debug assertions enabled)

Instead, this happened: The test case serialize_roundtrip_bincode fails due to a manifestation of UB in release mode. (See below for details)

Version it worked on

It most recently worked on: Rust 1.89

Version with regression

rustc --version --verbose:

rustc 1.90.0 (1159e78c4 2025-09-14)
binary: rustc
commit-hash: 1159e78c4747b02ef996e55082b704c09b970588
commit-date: 2025-09-14
host: aarch64-apple-darwin
release: 1.90.0
LLVM version: 20.1.8

Still reproduces with latest nightly (1.92). I believe it also reproduces on Linux (although I haven't tested with this minimal reproducer)

UB manifestation

During program execution (cargo test --release --lib), we reach line 76 during deserialization, and the printed error message reveals that the else if condition check got optimised away (the condition is !is_aligned, but is_aligned is true), which is only possible if there is UB.
black_boxing the address (usize) during deserialization or printing the address makes the UB manifestation disappear.
I believe the above program to be valid, assume that serde is sound, and can still reproduce the issue after replacing the 2 unsafe usages in bincode 1.3.3 we hit (reading and writing a u64) with safe rust.

@rustbot modify labels: +regression-from-stable-to-stable -regression-untriaged

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-bugCategory: This is a bug.I-miscompileIssue: Correct Rust code lowers to incorrect machine codeI-prioritizeIssue: Indicates that prioritization has been requested for this issue.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.T-opsemRelevant to the opsem teamregression-from-stable-to-stablePerformance or correctness regression from one stable version to another.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions