-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
make packed struct
always use a single backing integer, inferring it if not explicitly provided
#10113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
as a weak usecase (weak because this could just be a quirk of X11), the X protocol embeds bitmasks in u16 and u32 fields whose layout is dependent on the byte-order specified by the client in the protocol setup. i don't know if the preferred representation for bitmasks is defining a bunch of it's not super terrible, but having packed structs act however shifts act on the target architecture seems more correct to me. |
The That said, it's true that endianness is a problem for packed structs. However, enforcing a single consistent endianness is not a costless action. For example, if you have const Vec3 = packed struct {
x: i32,
y: i32,
z: i32,
}; then with this proposal every field access would require byte order conversion on big-endian platforms. Ultimately, the problem is that packed structs are used for two different purposes:
Ideally, we would have two different struct types to match the different use cases. Maybe something like |
No, it would not. So As an ordinary struct, but with defined field order and no extra padding. will be true, but the field order will be swapped in memory for big-endian platforms. Imho, the use cases are:
Only the last use case requires the use of extra work for handling endianess, but this is relevant in any case, as file formats, network protocols and such all need to define a byte order and using |
What happens with a very large packed struct? (Over 65536 bits). Is it still built like one giant integer, laid out in reverse field order on big endian? Also what about npot sizes (like 24 bits)? Does that round up to 32 or end at 24? |
@MasterQ32,
Sorry, I had misunderstood that part. Maybe because I wouldn't ever expect a "packed" struct to have its field order reversed on some platforms. 😄 Also, correct me if I'm wrong, but wouldn't this convention make packed structs useless for parsing binary data? Or at least you would have to always define two structs with fields A, B, C and C, B, A respectively, and then use the appropriate struct depending on platform endianness? |
The integer layout is only a mental layout, how access is implemented for non-integer backed structs isn't defined. So yeah, for huge big-endian structs, the order would still be "reverse" to allow efficient access of parts.
It will be the same as |
There are many more things you have to consider when working with packed structs. When you care about the bit order, bitfields are probably used together with mmio. It has concepts of such things like access/register sizes. I'm not sure if packed structs should have all of these covered. I have written a little library for working with mmio registers which specify bit offsets of fields, while still encompassing them within a register with a certain access/register size something like this: command_status: extern union {
raw: u32,
start: bf.Boolean(u32, 0),
recv_enable: bf.Boolean(u32, 4),
fis_recv_running: bf.Boolean(u32, 14),
command_list_running: bf.Boolean(u32, 15),
}, Here each bit offset is explicitly noted, which I believe is going to be desired when you want to deal with these kinds of layouts anyways. The only thing I'm missing is some kind of way of getting rid of the repeated register type/size. I think the goal for MMIO related structs should either be provided as a library solution like above, or at least something with a similar syntax (but hopefully without the repeated type) |
I know, this is covered by the integer-backed struct, that will be handled like the integer you base it on. See #5049 for a wider discussion of this topic |
Oh, I see. Yes that encompasses exactly what I was trying to say here. Nevermind my comment. |
To build on this, I currently (and very carefully) use packed structs to represent bit fields in MMIO hardware registers. It'd be nice if we can actually specify the backing register size. so something like this:
Where This doesn't clarify the 3 points (handling floats, etc) in the original post, so not sure if this would conflict with them. Maybe I'm asking for a completely different structure type here? Like instead of |
packed struct
issuepacked struct
always use a single backing integer, inferring it if not explicitly provided
What's the plan of action for arrays inside a Unfortunately, if standard array ordering is always used, a single large For example: const Foo= packed struct {
x: i32,
y: [2]i32,
}; With standard array ordering and this proposal |
A few options I can imagine would be:
|
There is a fairly common situation where this proposal would increase awkwardness, namely if a binary data format specifies big-endian encoding for its integers (think network headers). Consider this example: const X = packed struct {
a: u16,
b: u16,
c: u32,
} With normal field ordering, we have the following layouts:
Thus, when parsing this format into the struct, a byte swap on the individual fields needs to be performed on an LE platform, and no action is necessary on BE. With this proposal, the situation would change to:
Now both platforms need to perform a conversion. On little-endian it is still a per-field byte swap. But on big-endian we now need to reverse the order of fields without changing the fields themselves. If, on the other hand, the data format is little-endian, then no action is required on LE platforms and a whole-struct byte reversal is needed on BE. Except if the struct contains an array, as pointed out by @topolarity. Then we need to perform the big byte swap piecewise around the array(s), and also a per-element byte swap within the array, with the rules applying recursively if the elements are themselves packed structs. Before, we had a problem with byte order. Now we have two problems. |
I disagree that using
We have the guarantee that |
That depends. When parsing binary formats (especially convoluted legacy formats), I think it's a fairly common pattern to read a certain number of bytes corresponding to a particular substructure, cast it to a packed struct, extract the information needed to find and interpret other substructures, and repeat. You should not be obliged to use a deserialization library to do that. Slicing and dicing data by hand should feel natural in a low-level language like Zig. |
FWIW, I'm using That means that |
Based on @zzyxyzz's point regarding serialization, it seems worth exploring what it would take to enforce a single ordering for packed fields across little- and big-endian systems. We could define the field ordering on big-endian systems to match the little-endian field ordering. Every packed field would need a well-defined
This comes with costs on BE systems, but it depends on the field type:
The The hard cases are the >8-bit, NPOT integers like Footnotes
|
@topolarity
I admit this is less elegant than the original proposal, but I think that leaving the fields in the declared order more than makes up for that. |
That's a much simpler (better) description in the same spirit :-) I hadn't thought to choose the Maybe a less implicit rule is that any manually-aligned, byte-multiple integer ( |
Upon reflection, the alignment-based solution is also far from ideal. Having the layout change because you're off by one bit is inelegant, not to mention a footgun. But I think we're trying to fit a round peg into a square hole here. Pretty much all the hairy problems and open questions are caused by bit granularity. Any rigorous solution I can think of severely interferes with the simplicity and usability of ordinary byte-packed structs, and vice versa. So how about this: We let packed structs be packed structs, and introduce a separate bitfield type with the semantics of the original proposal. Packed structs
Bit fields
|
Yeah, all the existing proposals have some flaw (in terms of surprise, performance, etc.) by not allowing the user to be explicit about different representations across systems. If we are going to expose that representational complexity to the user, I'd prefer to just be able to override default platform ordering with an "endian(X)" tag. It would work like this:
I'm not sure whether I like this yet, but at least the user maintains strong control, along with good performance. Note: This would also orthogonalize the current proposal versus the serialization use case. Little-endian packed structs support this use case, regardless of whatever representation is chosen for big-endian machines |
I'm also not sure I like this :). My main objection would be that defining a "normal" packed struct (i.e. with sequential layout and efficient access) gets a lot more verbose: // before:
const S1 = packed struct {
a: u8,
b: u8,
c: u16,
d: u32,
};
// after:
const native_endian = std.Target.current.cpu.arch.endian();
const S2 = endian(.Little) packed struct {
a: endian(native_endian) u8,
b: endian(native_endian) u8,
c: endian(native_endian) u16,
d: endian(native_endian) u32,
}; It's also not quite clear to me how this will work with fractional-byte fields. BTW, I've found some old issues (#307, #649) where something similar was proposed, including a solution for pointers to non-byte-aligned fields (e.g., |
Seems that the design space is well-explored at this point. Thanks for the references :) I wonder if your bit field proposal above could be translated into just two variants:
The key constraint is that in a non-integer-backed This permits the common use cases of <8-bit fields and byte-aligned fields (
|
Can you explain the differece of const Packed = extern struct {
a: u32 align(1),
b: u16 align(1),
c: u32 align(1),
d: u8 align(1),
e: u64 align(1),
}; Because this use case is already supported. For me |
Two differences:
If we ignore any targets with exotic layout rules and the question of whether the described extern struct is compatible with the C ABI (which is a lot to ignore), then it's just sugar :-) FWIW, your question makes me think that a theoretical Edit: |
To provide a concrete example, consider a very similar-looking C11 struct: typedef struct {
alignas(1) uint32_t a; // keeps natural alignment - alignas has no effect
alignas(1) uint16_t b; // keeps natural alignment - alignas has no effect
alignas(1) uint32_t c; // keeps natural alignment - alignas has no effect
alignas(1) uint8_t d; // keeps natural alignment - alignas has no effect
alignas(1) uint64_t e; // keeps natural alignment - alignas has no effect
} Packed; This will have a different layout than your You might want to add a rule that |
I stand corrected: It looks like So there may be hope that a generalized
|
This implements #10113 for the self-hosted compiler only. It removes the ability to override alignment of packed struct fields, and removes the ability to put pointers and arrays inside packed structs. After this commit, nearly all the behavior tests pass for the stage2 llvm backend that involve packed structs. I didn't implement the compile errors or compile error tests yet. I'm waiting until we have stage2 building itself and then I want to rework the compile error test harness with inspiration from Vexu's arocc test harness. At that point it should be a much nicer dev experience to work on compile errors.
This is now implemented in self-hosted, which is now the default compiler. |
Yes. A "packed struct" in C is equivalent to an |
how about |
xref #12852 |
There is no magic bullet for byte order. It's painful and will always be painful. It requires too much context on the internal data and there's no simplistic and good way of representing it all, unlike what some proposals are trying to do. Not being able to use a simple packed array inside a struct is a pretty huge feature missing. Here's how you deal with byte order:
Byteswap requires impossible context. If someone tries to byteswap a struct with a packed array in it, just give an error! Introduce a second, ByteSwapAllExceptPackedStructArray() kinda function, that does as it says, for packed structs. |
So, everyone knows the problems we have with
packed structs
. Some are implementation problems, but we also have a huge problem with not having a proper specification on how they should work at all, except for "zero padding".So i want to propose a definition for
packed struct
s that is easy to understand and implement.Let's consider this code example:
What do you expect it to print?
Is it
T{ .a = 1, .b = 0 }
? Then you have assumed a little-endian platform, as on big endian, it will printT{ .a = 0, .b = 1 }
.Check it out on Compiler Explorer.
Personally, i find this confusing, thus i propose:
A packed struct will have a similar semantic as a unsigned integer with the same amount of bits. Each field is considered an unsigned integer of the same amount of bits.
To explain the idea, we have this struct:
So the packed struct implements exactly what we would do in a language without packed structs:
This means that we have a predictable model for backed structs and in case of
T2
, we can reason about it:This will be true for both little and big endian hardware, so it will do what we expect, even if the bytes on disk have a different order.
If a known byte order is required, we can use
std.mem.writeIntLittle
,@byteSwap
and others.In addition, we can declare a struct as
packed struct(u32)
, which will enforce the struct size to 32 bit, and we will get a compiler error if it isn't the case.This will also guarantee that this struct is handled as a
u32
and can be used as such, similar to enums. This also guarantees that loads and stores will not be sliced into more than one access if possible and allows atomic load/store for such packed structs. (See #5049).What needs to be clarified:
Related issues:
Regards
The text was updated successfully, but these errors were encountered: