-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
I have an enum
like this:
enum Format {
UnsignedInt8 = 0,
SignedInt8 = 1,
UnsignedInt16Le = 2,
UnsignedInt16Be = 3,
SignedInt16Le = 4,
SignedInt16Be = 5,
UnsignedInt32Le = 6,
UnsignedInt32Be = 7,
SignedInt32Le = 8,
SignedInt32Be = 9,
UnsignedInt64Le = 10,
UnsignedInt64Be = 11,
SignedInt64Le = 12,
SignedInt64Be = 13,
Ieee754FloatLe = 14,
Ieee754FloatBe = 15,
Ieee754DoubleLe = 16,
Ieee754DoubleBe = 17,
Utf16Le = 18,
Utf16Be = 19,
Utf32Le = 20,
Utf32Be = 21
}
It's kind of verbose and needs more codes in pattern matching, so I rewrite it like this:
enum Format2 {
Int8 { signed: bool }, // 0, 1
Int16 { signed: bool, big_endian: bool }, // 2, 3, 4, 5
Int32 { signed: bool, big_endian: bool }, // 6, 7, 8, 9
Int64 { signed: bool, big_endian: bool }, // 10, 11, 12, 13
Ieee754Float { big_endian: bool }, // 14, 15
Ieee754Double { big_endian: bool }, // 16, 17
Utf16 { big_endian: bool }, // 18, 19
Utf32 { big_endian: bool }, // 20, 21
}
I implemented TryFrom<u8>
and Into<u8>
for it, to make sure Format2
has the same mapping to u8
as Format
.
Everything works fine, except that the rewritten Format2
occupies 3 bytes, instead of 1.
Is there a possibility to add something like #[repr(compressed)]
to make Format2
's memory representation like Format
?
Currently, rustc can optimize the enum
to occupy only 1 byte if I comment out all the fields in the discriminants except the first one:
enum Format3 {
Int8 { signed: bool }, // 0, 1
Int16 , // { signed: bool, big_endian: bool }, // 2, 3, 4, 5
Int32 , // { signed: bool, big_endian: bool }, // 6, 7, 8, 9
Int64 , // { signed: bool, big_endian: bool }, // 10, 11, 12, 13
Ieee754Float , // { big_endian: bool }, // 14, 15
Ieee754Double , // { big_endian: bool }, // 16, 17
Utf16 , // { big_endian: bool }, // 18, 19
Utf32 , // { big_endian: bool }, // 20, 21
}
I'm not very familiar with rustc, but I can describe a possible algorithm to achieve this:
- Scan each discriminants of the
enum
, and calculate the number of all possible values it could have. Also records the offset start of each discriminant based on a prefix sum. If there are discriminants that:- have multiple fields, the number of all possible values is the product of that of all fields;
- have
enum
, tuple orstruct
fields, do this recursively;
3.have non-Copy
fields, stop this algorithm.
- Check the sum of all possible values for the whole
enum
, if it is:- in range
0..256
, use 1 byte to represent it; - in range
256..65536
, use 2 bytes to represent it; - ...
- in range
- Now each field of each discriminants has its value range (
offset..offset + num_of_values
). During pattern match, the discriminants are matched by its value range, and the fields are matched by its subrange. It might not be contiguous, then it can be matched by modulo or bit operations. (The value range can be extended to fit exponent of 2 for faster matching by bit operations, but the user should be able to control it.)
For example, for Int8 { signed: bool }
, there are 2 possible values, and it has offset 0
, so its range is 0..2
; for Int16 { signed: bool, big_endian: bool }
, there are 2 × 2 = 4 possible values, and it has offset 2
, so its range is 2..6
. If there is a match
match format {
Format::Int16 { signed: true, big_endian } => {/* ... */}
Format::Int32 { signed, big_endian: false } => {/* ... */}
// ...
}
It can be translated to
let format_u8 = format as u8; // maybe by transmute
match format_u8 {
4..6 => {
let big_endian = format_u8 % 2 != 0;
}
6..10 if format_u8 % 2 == 0 {
let signed = format_u8 - 6 >= 2;
}
// ...
}