Skip to content

Proposal: compressed layout of enum #3166

@frank-king

Description

@frank-king

I have an enum like this:

enum Format {
    UnsignedInt8 = 0,
    SignedInt8 = 1,
    UnsignedInt16Le = 2,
    UnsignedInt16Be = 3,
    SignedInt16Le = 4,
    SignedInt16Be = 5,
    UnsignedInt32Le = 6,
    UnsignedInt32Be = 7,
    SignedInt32Le = 8,
    SignedInt32Be = 9,
    UnsignedInt64Le = 10,
    UnsignedInt64Be = 11,
    SignedInt64Le = 12,
    SignedInt64Be = 13,
    Ieee754FloatLe = 14,
    Ieee754FloatBe = 15,
    Ieee754DoubleLe = 16,
    Ieee754DoubleBe = 17,
    Utf16Le = 18,
    Utf16Be = 19,
    Utf32Le = 20,
    Utf32Be = 21
}

It's kind of verbose and needs more codes in pattern matching, so I rewrite it like this:

enum Format2 {
    Int8 { signed: bool },                    // 0, 1
    Int16 { signed: bool, big_endian: bool }, // 2, 3, 4, 5
    Int32 { signed: bool, big_endian: bool }, // 6, 7, 8, 9
    Int64 { signed: bool, big_endian: bool }, // 10, 11, 12, 13
    Ieee754Float { big_endian: bool },        // 14, 15
    Ieee754Double { big_endian: bool },       // 16, 17
    Utf16 { big_endian: bool },               // 18, 19
    Utf32 { big_endian: bool },               // 20, 21
}

I implemented TryFrom<u8> and Into<u8> for it, to make sure Format2 has the same mapping to u8 as Format.

Everything works fine, except that the rewritten Format2 occupies 3 bytes, instead of 1.

Is there a possibility to add something like #[repr(compressed)] to make Format2's memory representation like Format?

Currently, rustc can optimize the enum to occupy only 1 byte if I comment out all the fields in the discriminants except the first one:

enum Format3 {
    Int8  { signed: bool },                    // 0, 1
    Int16 , // { signed: bool, big_endian: bool }, // 2, 3, 4, 5
    Int32 , // { signed: bool, big_endian: bool }, // 6, 7, 8, 9
    Int64 , // { signed: bool, big_endian: bool }, // 10, 11, 12, 13
    Ieee754Float  , // { big_endian: bool },        // 14, 15
    Ieee754Double , // { big_endian: bool },       // 16, 17
    Utf16 , // { big_endian: bool },               // 18, 19
    Utf32 , // { big_endian: bool },               // 20, 21
}

I'm not very familiar with rustc, but I can describe a possible algorithm to achieve this:

  1. Scan each discriminants of the enum, and calculate the number of all possible values it could have. Also records the offset start of each discriminant based on a prefix sum. If there are discriminants that:
    1. have multiple fields, the number of all possible values is the product of that of all fields;
    2. have enum, tuple or struct fields, do this recursively;
      3.have non-Copy fields, stop this algorithm.
  2. Check the sum of all possible values for the whole enum, if it is:
    1. in range 0..256, use 1 byte to represent it;
    2. in range 256..65536, use 2 bytes to represent it;
    3. ...
  3. Now each field of each discriminants has its value range (offset..offset + num_of_values). During pattern match, the discriminants are matched by its value range, and the fields are matched by its subrange. It might not be contiguous, then it can be matched by modulo or bit operations. (The value range can be extended to fit exponent of 2 for faster matching by bit operations, but the user should be able to control it.)

For example, for Int8 { signed: bool }, there are 2 possible values, and it has offset 0, so its range is 0..2; for Int16 { signed: bool, big_endian: bool }, there are 2 × 2 = 4 possible values, and it has offset 2, so its range is 2..6. If there is a match

match format {
    Format::Int16 { signed: true, big_endian } => {/* ... */}
    Format::Int32 { signed, big_endian: false } => {/* ... */}
    // ...
}

It can be translated to

let format_u8 = format as u8; // maybe by transmute
match format_u8 {
    4..6 => {
        let big_endian = format_u8 % 2 != 0;
    }
    6..10 if format_u8 % 2 == 0 {
        let signed = format_u8 - 6 >= 2;
    }
    // ...
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions