Skip to content

Representation of unions #13

@nikomatsakis

Description

@nikomatsakis
Contributor

Discussing how unions are laid out.

  • Is #[repr(C)] meaningful when applie to a union?
  • When (if ever) do we guarantee that all fields start at offset 0?
  • When (if ever) do we guarantee that all fields have the same address?
  • Any key things to note re: FFI interop?

Activity

added
A-layoutTopic: Related to data structure layout (`#[repr]`)
on Aug 30, 2018
joshtriplett

joshtriplett commented on Aug 30, 2018

@joshtriplett
Member

#[repr(C)] is meaningful (it guarantees that the union will have the same layout as an equivalent C union would); we do need to determine if #[repr(Rust)] wants to diverge from that, though. (Or if we want to guarantee that it'll never diverge in the future.)

Also, some relevant text from C11:
6.5.3.6.6:

One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

6.5.8.5:

All pointers to members of the same union object compare equal.

6.7.2.1.16:

A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit- field, then to the unit in which it resides), and vice versa.

hanna-kruppe

hanna-kruppe commented on Aug 30, 2018

@hanna-kruppe

AFAIK the first paragraph you quoted (6.5.3.6.6) is solely about strict aliasing/TBAA, which Rust doesn't do. The other two quotes seem to practically guarantee that all members of the union start at the same offset, and offset 0 at that unless the union foo * -> struct union_member* cast adjusts the pointer. Does that sound right?

joshtriplett

joshtriplett commented on Aug 30, 2018

@joshtriplett
Member

@rkruppe Yes. (Also, I think the "suitably converted" there exclusively means a type conversion, not a value change.)

added
A-layoutTopic: Related to data structure layout (`#[repr]`)
and removed
A-layoutTopic: Related to data structure layout (`#[repr]`)
on Aug 31, 2018
nikomatsakis

nikomatsakis commented on Sep 5, 2018

@nikomatsakis
ContributorAuthor

we do need to determine if #[repr(Rust)] wants to diverge from that, though. (Or if we want to guarantee that it'll never diverge in the future.)

Seems like we might as well reserve the right for that, although I don't see much motivation. Maybe we should drill in to some of the more specific questions:

  • Do we guarantee that fields begin at offset 0 for #[repr(Rust)] unions?
  • Or that they have the same address?

I am sort of of tempted to do so, because I don't know that there is much practical utility to doing otherwise, but I'd be curious to hear of use cases.

joshtriplett

joshtriplett commented on Sep 5, 2018

@joshtriplett
Member

In the interests of full evaluation of alternatives: the only argument I've heard for doing otherwise would be if we could detect that all the variants in a repr(Rust) union had "holes" in their representations, and then arrange those representations within the union such as to give the overall union a "hole". We could only do so if we 1) allowed flexibility in representation for repr(Rust) unions, and 2) prohibited repr(Rust) unions from containing arbitrary unknown bit patterns not expressed by the field types (which repr(C) unions have to allow).

I don't believe we should do either of those things, but I wanted to mention the arguments for doing so for completeness.

hanna-kruppe

hanna-kruppe commented on Sep 5, 2018

@hanna-kruppe

That's an interesting line of thought! However, what could we actually use these "holes" for? I assume you're referring to padding. I'm not aware of any way to stash discriminants or other data in a contained type's padding. Any write or copy is allowed to omit or clobber padding bytes at random. For example, suppose Result<(u8, u32), u8> wanted to place the discriminant or the u8 in the padding of the tuple, this breaks as soon as someone takes a mutable reference to the tuple and writes to it.

gnzlbg

gnzlbg commented on Sep 5, 2018

@gnzlbg
Contributor

Do we guarantee that fields begin at offset 0 for #[repr(Rust)] unions?

Do we want #[repr(Rust)] unions? What do they allow ? E.g. I would be fine with just requiring that all unions must be #[repr(C)] for now, adding a warning for non #[repr(C)] unions, and maybe in the 2018 edition turning that into an error.

We can then, at some point, re-consider adding #[repr(Rust)] unions, with suitable motivation. I am not saying that they would be useless, but that if we are going to specify them exactly the same as #[repr(C)], we don't need both, and not being able to use them in C-FFI would be a downside of repr(Rust) union w.r.t. repr(C) here.

We don't have to allow all kinds of types for all reprs, and doing so makes us waste time in the specification of each repr.

joshtriplett

joshtriplett commented on Sep 5, 2018

@joshtriplett
Member

@gnzlbg We already have repr(Rust) unions in stable, and people are actively using them. People want to be able to build the space-efficient data structures they allow, and similar.

joshtriplett

joshtriplett commented on Sep 5, 2018

@joshtriplett
Member

@rkruppe I'm not talking about padding. I'm talking about things like enums and bool not using all the bits in their representation.

If I have a repr(Rust) union of a bool and a three-variant enum, and I then wrap that in an Option, could that fit in one byte?

Again, I don't think that we should do that, but people have suggested doing so.

hanna-kruppe

hanna-kruppe commented on Sep 5, 2018

@hanna-kruppe

Ah, that makes more sense. I also don't think we should do this, though, I'm in favor of the "unions are bags of uninterpreted bits" approach that we seem to be slowly converging on (e.g. with the disposition to merge rust-lang/rfcs#2514).

14 remaining items

self-assigned this
on Oct 11, 2018
gnzlbg

gnzlbg commented on Oct 12, 2018

@gnzlbg
Contributor

What is the layout of union variants when the repr of the different variants differ? E.g.:

union U {
    a: __m128, // repr(simd)
    b: (),
}

Currently, U.a appears to have array layout instead of repr(simd) layout.

hanna-kruppe

hanna-kruppe commented on Oct 12, 2018

@hanna-kruppe

Memory layout is no different. Calling convention details are different (and that factors into the relatively superficial difference in the IR we produce observed over there in that PR), but I see no reason to specify those for repr(rust) unions, as they are irrelevant outside of FFI which one should use repr(C) for anyway.

RalfJung

RalfJung commented on Oct 12, 2018

@RalfJung
Member

So the distinction between[...] and <...> in LLVM should just affect calling conventions, but it does affect more, and that's the problem in that PR?

hanna-kruppe

hanna-kruppe commented on Oct 12, 2018

@hanna-kruppe

No, arrays vs vectors is a quite important distinction for the IR, but none of those differences except ABI lowering affect the sort of visible behavior we are documenting here.

RalfJung

RalfJung commented on Oct 13, 2018

@RalfJung
Member

To summarize the discussion that happened here, the consensus seems to be that repr(C) unions have all their fields at offset 0, and for repr(Rust) that's likely the most sensible option but we'll have to await the discussion about validity invariants for unions to rule out use-cases like #13 (comment). Any objections?

nikomatsakis

nikomatsakis commented on Oct 23, 2018

@nikomatsakis
ContributorAuthor

@RalfJung sounds great to me! Do you think you can get a write-up done by this Thursday? Would be good to have somethng by the meeting. =)

RalfJung

RalfJung commented on Oct 24, 2018

@RalfJung
Member

Done: #39

Feels rather short, but what else is there to say?

gnzlbg

gnzlbg commented on Nov 20, 2018

@gnzlbg
Contributor

It would be useful to use unions like MaybeUninit<T> in C FFI, but for that MaybeUninit<T> would need to have the same repr as T. Currently we don't have anything like repr(transparent) for unions, and for the case of MaybeUninit<T> which is an union with two variants, it's unclear to me whether something like repr(transparent) would work.

RalfJung

RalfJung commented on Jun 29, 2019

@RalfJung
Member

Turns out regex relies on repr(Rust) unions having their field at offset 0:

https://github.com/rust-lang/regex/blob/172898a4fda4fd6a2d1be9fc7b8a0ea971c84ca6/src/vector/ssse3.rs#L80-L83

I bet they are not the only ones...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

A-layoutTopic: Related to data structure layout (`#[repr]`)S-writeup-neededStatus: Ready for a writeup and no one is assigned

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @nikomatsakis@joshtriplett@RalfJung@gnzlbg@hanna-kruppe

      Issue actions

        Representation of unions · Issue #13 · rust-lang/unsafe-code-guidelines