Skip to content

use cases for alignment, packing and byte order #488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kyle-github opened this issue Sep 16, 2017 · 4 comments
Closed

use cases for alignment, packing and byte order #488

kyle-github opened this issue Sep 16, 2017 · 4 comments
Milestone

Comments

@kyle-github
Copy link

@andrewrk is probably tired of this by now, but I think it might be worth at least recording some use cases that make other proposals for struct handling less complete.

One of the problems is that there are cases where you really want to control the exact representation of data in memory and others where you just want to use a higher-level concept.

Here is a short list (not at all complete) of some reasons why you would want to control some aspect of the in-memory representation:

  1. (Alignment) Hardware twiddling: memory mapped IO. There are cases where a 32-bit value is mapped at some location in memory but that location is NOT 32-bit aligned. Or vice versa that a location requires an alignment bigger than the natural alignment of the object. This one is not specific to structs.
  2. (packing and field ordering) Saving memory. If I want to really put a lot of stuff in memory, I want to have the tightest packing possible.
  3. (internal field alignment and overall structure ordering) Controlling cache effects. There are circumstances where controlling the size and ordering of fields in a structure can be used to optimize cache behavior. You can get really strange data structures when you start thinking about column-oriented data. I may even want to have the compiler switch from array-of-struct to struct-of-array representation. That kind of representation is used in things like geometry (in games etc.), in-memory databases etc.
  4. (alignment, field ordering and byte ordering) Reading/writing binary data to a stream such as a file or network connection. A TCP/IP packet's headers are in big-endian order. PNG and JPEG specify specific byte orders. Some wide character encodings have specific byte orders.
  5. (alignment, padding) Padding allocations for "invisible" metadata or other hacks. This is much less useful in Zig than C, but there may be cases not covered in Zig. I use this hack in my own simple C ref counting implementation to store the ref count info before the data block.
  6. (field ordering) Polymorphic pointer handling. C guarantees that a pointer to a struct is also a pointer to the first (lexical) field in the struct. That is used to do a form of polymorphism by having a parent struct be the first field in a child struct. Then a pointer to a child can be cast to a parent and any parent function can be called. Reordering fields stops that from working.
  7. (field ordering, padding, alignment) Atomic access. You may want to specify that all fields in a struct are going to be accessed via atomic instructions. In that case, it is likely that the CAS-like instructions the CPU offers only work on a very limited set of sizes. Any fields smaller than the smallest CAS instruction size would need to be padded out so that you did not try to atomically access multiple small fields at once by accident. This might not be deadly to your program correctness but it could cause a lot of contention)

Alignment above is actually two things. There is the alignment of the entity itself (where the struct or value starts) and there is the alignment of fields within the entity (in the case of a struct). Field alignment can be native (aligned to the natural alignment of the field itself), some minimum, and even a maximum. Some fields are aligned to a size that is not some nice multiple or fraction of the field. For instance the weird x87 long-ish double is 80 bits.

Padding can be somewhat included in alignment. You may want to pad a struct out to a certain alignment at all times. I do this kind of garbage in C:

`
struct rc {
int count;
lock_t lock;
void (*cleanup_func)(void *data);

union {
    uint8_t dummy_u8;
    uint16_t dummy_u16;
    uint32_t dummy_u32;
    uint64_t dummy_u64;
    double dummy_double;
    void *dummy_ptr;
    void (*dummy_func)(void);
} dummy_align[];

};
`

I could not find another way to make sure that it would pad out this struct to be properly aligned. Probably not all of the dummy fields are necessary, but I found enough questionable things in my searches that I decided that it was better to do this hack with a belt and suspenders.

Suppose you have a struct with a lot of bool fields. You may put them in the struct in a lexical order that matches how you might use them. In many cases it would be nice to tell the compiler to pack them all into a few bytes at the beginning of the struct for size. Given the extreme difference in speed between a cache hit and miss, packing data tightly so that more fits in the cache can have a huge effect on some algorithms.

It would be nice to be able to control all these "knobs" on a data element. On all data, alignment of the targets of pointers would be useful to control. With structs, it would be good to be able to control:

  • alignment of the struct itself
  • padding of the struct itself (often related to alignment so perhaps redundant)
  • order of fields (or lack of an order constraint)
  • field padding
  • field alignment (related to the above)
  • field byte order (big endian and little endian are two of the most common possibilities but definitely not all of them)
  • field combination (the bool example)

I thought I would try to tie all these thoughts into one place.

@PavelVozenilek
Copy link

PavelVozenilek commented Sep 16, 2017

Certain (few) Win32 structs have non-standard alignment of its fields. Modelling binary formats is another use case. Bitfields are minefields ( http://www.catb.org/esr/structure-packing/ ).

I remember rant by Linus Torvalds, how people really, really need think hard about the invisible padding in structures and try to optimize it away.

I personally would prefer explicit padding, checked by the compiler to be correct, instead of the mess and guesswork.

struct foo {
  u8 field1;
  padding(1); // impossible to access or take address
  u16 field2;
  padding(2);
  u32 field3;
  u8 field4;
  padding!(0); // padding! means going against the rules
  u16 field5;
  padding!(3); 
};


@skyfex
Copy link

skyfex commented Jan 8, 2018

I think explicit support for padding is very important. Currently, if you just use special fields for this, you're forced to name them when creating an instance of the struct. You also have to manually maintain some kind of number scheme when naming padding. Special syntax would solve a lot of real-world issues

pin_cnf = PinCnf { .dir = 1, ._reserved1 = 0, .input = 1, ._reserved2 = 0 };

Should I create a dedicated issue for this? For my usecase this is the only outstanding annoyance with Zigs current implementation

@andrewrk
Copy link
Member

andrewrk commented Jan 8, 2018

Should I create a dedicated issue for this?

Yes please, I have some thoughts which I will wait and comment on that issue when you create it.

@andrewrk
Copy link
Member

This issue should be addressed by #3133

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants