Skip to content

Generalization of a struct/union that allows for explicitly defined memory layout #6478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mrakh opened this issue Oct 1, 2020 · 8 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@mrakh
Copy link
Contributor

mrakh commented Oct 1, 2020

When dealing with data that depends on fixed memory offsets, such as in MMIO or serialization routines, it's sometimes convenient to be able to explicitly specify field offsets, especially when they are non-contiguous or have unaligned overlap. Of course, any such composite type can be emulated with nested structs/unions and dummy padding fields. Or you could also use other workarounds, like indexing into arrays with descriptively named const variables, or hiding the logic behind getter and setter functions. However, such techniques are typically hard to read, repetitive to write, and/or error-prone, making them ill-suited for the aforementioned use case.

I believe it would be nice to have a generalized composite primitive that allows for a user-defined memory layout, where each field would be qualified with a type and an offset. This would make certain patterns easier to express: (making up possible syntax)

// Sparse structures
const MMIOControlRegisters = memblock {
    enable_interrupt : u8 : 0x00000100,
    dma_dest_physaddr : u48 : 0x00000320,
    desc_select : u16 : 0x00000005e,
};

Or perhaps the offset values could be made bit-addressable:

// Making some endianness assumptions here
const WebSocketHeader = memblock {
    fin : u1 : 0,
    rsv1 : u1 : 1,
    rsv2 : u1 : 2,
    rsv3 : u1 : 3,
    opcode : u4 : 4,
    mask_flag : u1 : 8,
    len_code : u7 : 9,
    
    payload_len_small : u7 : 9,
    mask_small : u32 : 16,
    
    payload_len_med : u16 : 16,
    mask_med : u32 : 32,
    
    payload_len_large : u64 : 16,
    mask_large : u32 : 80,
};
@daurnimator
Copy link
Contributor

I think this should be done by re-introducing the offset field (removed in 0f67781) to StructField so that @Type(.{.Struct = .{ .... }) can create custom layouts.

e.g.

const MMIOControlRegisterStruct = @Type(.{
    .Struct = .{
        // .layout =
        .fields = &[_]std.builtin.TypeInfo.StructField{
            .{
                .name = "enable_interrupt",
                .field_type = u8,
                .offset = 0x00000100,
            },
            .{
                .name = "dma_dest_physaddr",
                .field_type = u48,
                .offset = 0x00000320,
            },
            .{
                .name = "desc_select",
                .field_type = u16,
                .offset = 0x00000005e,
            },
        },
        // .decls = .{},
    },
});

@codehz
Copy link
Contributor

codehz commented Oct 1, 2020

(but use @Type directly for generate type is horrible, maybe we can add field tag or attribute in another language)

Technically we can parse string in comptime and generate struct, but it is still inconvenient...

proposal:
like

const MMIOControlRegisterStruct = transform(struct {
	enable_interrupt: u8 "offset: 0x00000100",
});
fn transform(comptime Original: type) type {
	var orig = @TypeInfo(Original);
	inline for (orig.fields) |*field| {
		field.offset = parseTag(field.tag, usize, "offset");
	}
	return @Type(orig);
}

Oh, there is already Tags proposal: #1099

@alexnask alexnask added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Oct 1, 2020
@Rocknest
Copy link
Contributor

Rocknest commented Oct 1, 2020

@codehz name of the field can be a tag

const MMIOControlRegisterStruct = Transform(struct {
	enable_interrupt_OFFSET_0x00000100: u8,
        // or
        @"enable_interrupt$0x00000100": u8,
});
fn Transform(comptime Original: type) type {
	var orig = @TypeInfo(Original);
	inline for (orig.fields) |*field| {
            if (mem.indexOf(u8, field.name, "OFFSET_") != 0)
		field.offset = parseTag(mem.substring(u8, 0, mem.indexOf(u8, field.name, "OFFSET_")), usize);
	}
	return @Type(orig);
}

@mb64
Copy link
Contributor

mb64 commented Oct 3, 2020

I have another use case for opting into more well-defined layout:

fn Rc(comptime T: type) type {
    return struct {
        strong_count: usize,
        weak_count: usize,
        value: T,

        fn weak(self: *@This()) *Weak(T) {
            self.weak_count += 1;
            return @ptrCast(*Weak(T), self);
        }
    };
}

fn Weak(comptime T: type) type {
    return struct {
        strong_count: usize,
        weak_count: usize,
        value: T,

        fn upgrade(self: *@This()) ?*Rc(T) {
            if (self.strong_count == 0) return null;
            self.strong_count += 1;
            return @ptrCast(*Rc(T), self);
        }
    };
}

Currently, this wouldn't work, since Rc(T) and Weak(T) aren't guaranteed to have the same layout. Some method for guaranteeing deterministic layout would be necessary.

I don't think there's a good way to do this is Zig right now, but I'd love to be corrected. extern annotations get partway there, but require T to be extern as well.

@Vexu Vexu added this to the 0.8.0 milestone Oct 3, 2020
@daurnimator
Copy link
Contributor

Currently, this wouldn't work, since Rc(T) and Weak(T) aren't guaranteed to have the same layout. Some method for guaranteeing deterministic layout would be necessary.

I don't think there's a good way to do this is Zig right now, but I'd love to be corrected. extern annotations get partway there, but require T to be extern as well.

packed should work for this use case; though you then need to reintroduce alignment.

@mb64
Copy link
Contributor

mb64 commented Oct 8, 2020

I agree that packed struct should work, though in a strictly normative sense: it ought to work but currently doesn't.

const T = struct { x: i32 };
const P = packed struct { t: T align(@alignOf(T)) };

Produces the message

./t.zig:2:27: error: non-packed, non-extern struct 'T' not allowed in packed struct; no guaranteed in-memory representation
const P = packed struct { t: T align(@alignOf(T)) };
                          ^

@mrakh
Copy link
Contributor Author

mrakh commented Oct 8, 2020

Just to add that I believe that this proposal would be the best way to address the issues presented in #6349, #985 and #1214 - any construction of nested anonymous structures could be easily converted to a memblock with translate-c while faithfully maintaining the field semantics and preserving the struct ABI.

@AsgardXIV
Copy link

Having a simple solution to this would be really great.

C# has explicit struct layouts

[StructLayout(LayoutKind.Explicit, Size = 0xF0)]
public unsafe struct MyStruct {
    [FieldOffset(0x08)] public float Var1;
    [FieldOffset(0x0C)] public int Var2;
    ...

It makes C# really attractive for use cases such as Reverse Engineering, MMIO, easily reading binary files etc as it makes matching layouts super simple.

Zig feels like it should be an even more natural fit for those use cases in general, but without explicit struct layouts it falls a little short currently. No language really fills this niche fully at the moment, but this feature would go a long way to making Zig the standout in that category.

@andrewrk andrewrk modified the milestones: 0.14.0, 0.15.0 Feb 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

9 participants