Allocgate #10055

leecannon · 2021-10-29T02:04:34Z

This PR is a refactor of the Allocator interface to allow inlining more often and to prevent the issue regarding @fieldParentPointer and dirty tracking that LLVM suffers from as explained here

Closes #10052

daurnimator · 2021-10-29T02:08:38Z

This increases the size of the Allocator argument passed to a huge number of functions to be 3*usize.
I expect this to have significant negative effects.

andrewrk · 2021-10-29T02:49:56Z

I think @daurnimator has a good point. Have we tried this instead?

const Allocator = struct {
    ptr: *c_void,
    vtable: *const VTable,

    const VTable = struct {
        alloc: fn (ptr: *c_void, len: usize, ptr_align: u29, len_align: u29, ret_addr: usize) Error![]u8,
        resize: fn (ptr: *c_void, buf: []u8, buf_align: u29, new_len: usize, len_align: u29, ret_addr: usize) Error!usize,
    };
};

This might also address the issue that @ominitay found where LLVM did not do the devirtualization, since the vtable pointer is marked constant and could point to global static constant memory. It also lets us drop the Fn suffix.

leecannon · 2021-10-29T03:32:40Z

I have done some extremely simple comparisons using code where the overhead of the virtual call is most noticeable.

FixedBufferAllocator

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./trivial_fba_old`	10.8 ± 1.9	5.7	12.5	2.59 ± 0.57
`./trivial_fba_new`	4.6 ± 1.1	2.0	6.4	1.10 ± 0.31
`./trivial_fba_new_vtable`	4.2 ± 0.6	1.2	7.4	1.00

fba code

const std = @import("std");

var backing_buffer: [std.mem.page_size * 200]u8 = undefined;
var fba = std.heap.FixedBufferAllocator.init(&backing_buffer);

pub fn main() !void {
    const allocator = fba.allocator(); // or &fba.allocator;

    var to_deinit: [backing_buffer.len]*u8 = undefined;

    var i: usize = 0;
    while (true) : (i += 1) {
        const ptr = allocator.create(u8) catch break;
        to_deinit[i] = ptr;
    }

    for (to_deinit) |ptr| {
        allocator.destroy(ptr);
    }
}

GeneralPurposeAllocator

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./trivial_gpa_old`	46.7 ± 7.1	28.1	52.3	1.16 ± 0.33
`./trivial_gpa_new`	43.7 ± 8.0	24.0	51.5	1.08 ± 0.33
`./trivial_gpa_new_vtable`	40.4 ± 9.9	25.7	49.8	1.00

gpa code

const std = @import("std");

const amount = std.mem.page_size * 200;

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator = gpa.allocator(); // or &gpa.allocator;
    defer _ = gpa.deinit();

    var to_deinit: [amount]*u8 = undefined;

    var i: usize = 0;
    while (i < amount) : (i += 1) {
        const ptr = allocator.create(u8) catch break;
        to_deinit[i] = ptr;
    }

    for (to_deinit) |ptr| {
        allocator.destroy(ptr);
    }
}

More extensive testing with non-trivial codebases would be required to see what the real world impact of these changes are.

andrewrk · 2021-10-29T06:47:16Z

Do I understand correctly that the strategy with vtable: *const VTable, seems to perform the best of all?

leecannon · 2021-10-29T07:25:17Z

@andrewrk yes, seems so

ominitay · 2021-10-29T15:28:23Z

I'm finding in my benchmarks that the VTable consistently slows things down compared to without. example1 is the same as the GPA example which @leecannon sent.

ominitay · 2021-10-29T15:47:26Z

And regarding the FBA example, the same pattern is shown

ominitay · 2021-10-29T15:49:15Z

In fact, the generated assembly for -new and -new-vtable are byte-for-byte identical. This does not apply to the GPA example though.

jayschwa · 2021-10-29T15:54:13Z

This increases the size of the Allocator argument passed to a huge number of functions to be 3*usize. I expect this to have significant negative effects.

Theoretically, won't Zig transform these to pass-by-reference if it's more optimal? If yes, is the vtable method simply a practical "workaround" for Zig (or LLVM) not recognizing the optimization potential in this particular situation?

leecannon · 2021-10-29T16:02:27Z

I have been looking closer at this with perf, valgrind, etc. and they have shown either no change or worse (cache misses, stalled cycles, etc) for the vtable version.

Also as @ominitay has shown the vtable and non-vtable version either differ by less than the margin of error or the vtable is substantially worse.

I am going to revert the vtable change, if anyone can provide an example of where the 3 usize size of Allocator is a detriment it can be reconsidered.

Jarred-Sumner · 2021-10-31T11:15:34Z

What if allocator was a comptime argument? Is it common to dynamically change which allocator is in use at runtime? I don't think I've seen code that does that. If allocator became a comptime argument, would that mean the cost of passing the Allocator around becomes zero? I also wonder what the impact to zig compilation time would look like

ikskuh · 2021-10-31T11:26:23Z

What if allocator was a comptime argument? Is it common to dynamically change which allocator is in use at runtime? I don't think I've seen code that does that. If allocator became a comptime argument, would that mean the cost of passing the Allocator around becomes zero? I also wonder what the impact to zig compilation time would look like

This would not work, as comptime arguments must be constant at comptime. Something like FixedBufferAllocator would not work. If you only want to pass fn allocates(comptime Allocator: type, allocator: Allocator) !void, you'll make everything generic, and compile times will expode. You'll also be unable to use the same module with two different allocators in your project, as you will duplicate the code and all functions/variables in that module.

Example:
Everything that takes an ArrayList(u8) argument will suddenly be generic over the array list, as it would ArrayList(Allocator, u8). I assume this won't scale at all, will bloat the executables similar to std.fmt or std.Writer

Jarred-Sumner · 2021-10-31T11:44:34Z

Ah, yeah that makes sense. It becomes a kind of function coloring almost.

…

On Sun, Oct 31, 2021 at 4:26 AM Felix Queißner ***@***.***> wrote: What if allocator was a comptime argument? Is it common to dynamically change which allocator is in use at runtime? I don't think I've seen code that does that. If allocator became a comptime argument, would that mean the cost of passing the Allocator around becomes zero? I also wonder what the impact to zig compilation time would look like This would not work, as comptime arguments must be constant at comptime. Something like FixedBufferAllocator would not work. If you only want to pass fn allocates(comptime Allocator: type, allocator: Allocator) !void, you'll make everything generic, and compile times will expode. You'll also be unable to use the same module with two different allocators in your project, as you will duplicate the code and all functions/variables in that module. Example: Everything that takes an ArrayList(u8) argument will suddenly be generic over the array list, as it would ArrayList(Allocator, u8). I assume this won't scale at all, will bloat the executables similar to std.fmt or std.Writer — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#10055 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFNGS2LVAY5YNVCDSOBG2DUJURWTANCNFSM5G6HL4CA> .

ifreund · 2021-11-01T00:04:32Z

A while ago on IRC @andrewrk mentioned this as one of several issues with the allocator interface design:

00:55 <andrewrk> I want to do another design pass on the Allocator interface 
00:55 <andrewrk> a handful of design issues has piled up
00:55 <andrewrk> * the @fieldParentPtr way of abstracting is fundamentally more difficult to optimize than a vtable strategy
00:56 <andrewrk> * the mechanism to get more bytes than you asked for is dubious and might be at worst doing more harm than good and at best a bunch of complication for no benefit
00:57 <andrewrk> * I want allocator implementations to have access to a comptime alignment. maybe even the type??
00:57 <andrewrk> * @memset to undefined should happen in the Allocator implementations rather than the interface to avoid redundant memsets in safe modes
00:59 <andrewrk> * the API is a bit of a mess

In particular, I wonder how these changes interacts with the desire to make a comptime alignement or type available to the allocator implementations. It would be best to do one massively breaking change instead of two if at all possible.

marler8997 · 2021-11-03T16:41:30Z

the mechanism to get more bytes than you asked for is dubious and might be at worst doing more harm than good and at best a bunch of complication for no benefit

I'm curious about @andrewrk reasoning on this. If it really has no benefit then we can remove it, but I believe there's use cases for it (i.e. ArrayList), but maybe Andrew thinks those use cases can use another mechanism and be just as good?

For most cases, realloc is fine. The only case I think think of that could benefit from having more than realloc is if you just want to know how much space you have available, because you're not sure how much you'll need yet but you would benefit from knowing the full amount you have available to you. Without an interface to get the available length, you might realloc more than is available but you end up not actually needing it, so you would end up moving memory when you didn't need to.

Example:

You're putting data into an ArrayList. It has allocated 200 bytes. The allocator reserved 256 bytes for it, but ArrayList doesn't know that (because we removed the mechanism to query this). The ArrayList is full but doesn't know how much it will need, so it just asks for double its size, 400 bytes. The allocator can't expand its current slot, so it has to reallocate a new larger slot and move all the memory over. In the end, the ArrayList only actually needed 30 more bytes, so this reallocation/memcpy was wasted. Is there a way to avoid this scenario if we remove the mechanism to know how much space is actually reserved by the allocator?

marler8997 · 2021-11-03T21:31:17Z

Here's an idea that should remove the need for len_align, remove the need for the @fieldParentPtr mechanism and provide access to comptime alignment

///
/// A new allocator API.  Instead of an allocator providing 2 function pointers, they
/// provide a single function pointer that takes a tagged union represnting an "AllocatorRequest".
/// This allows us to enhance/scale the allocator interface with more operations without having
/// to add additional function pointers.
///
/// Using this new ability to add more operations, we can remove the `len_align` parameter to the
/// allocation functions and add a `query_full_capacity` request intead.
///
/// We can also use this new ability to easily add operations to support comptime aligned allocations.
///
/// Note: I've added context pointer fo the AllocFn so we don't need to depend on `@fieldParentPtr`
pub const AllocFn = fn (self: *c_void, request: *AllocatorRequest) void;
pub const AllocatorRequest = union(enum) {

    alloc: struct { len: usize, align: u29, ret_addr: usize },

    // means we can allocation with comptime alignments up to 16
    alloc_align2: struct { len: usize, ret_addr: usize },
    alloc_align4: struct { len: usize, ret_addr: usize },
    alloc_align8: struct { len: usize, ret_addr: usize },
    alloc_align16: struct { len: usize, ret_addr: usize },

    resize: Resize,

    /// returns the full capacity available to an allocation
    query_full_capacity: struct { buf: []u8, align: u29 },

    pub const Resize = struct {
        buf: []u8,
        align: u29,
        new_len: usize,
        ret_addr: usize,
    };
}

SpexGuy · 2021-11-03T22:01:07Z

union (enum) with runtime known fields has a runtime known tag, so I don't see how this makes anything comptime-known. Except that you could make specialized code paths, but you can do that anyway with the current pattern, and it's not something we should force allocators to do. Running everything through one function seems potentially dubious, since in the case where the function is not inlined it's a lot of unnecessary branching. This pattern means that in order to query the size of an allocation, you need to construct a 48 byte stack value in memory and then pass the address of that into a virtual function. While this does solve the api problems, so does a vtable, and I think the vtable approach is probably better/faster/easier to optimize.

marler8997 · 2021-11-03T23:13:17Z

union (enum) with runtime known fields has a runtime known tag, so I don't see how this makes anything comptime-known. Except that you could make specialized code paths, but you can do that anyway with the current pattern, and it's not something we should force allocators to do.

Oh yeah now that I think about it you might be right here. I think I need some clarification from Andrew about what he means by comptime-known alignment. My suggestion here was based on an old suggestion from him where he added new allocation function pointers for a limited set of alignments. That being said, using function pointers don't have the same downside as union (enum) (like you say) since function pointers don't need to branch like a tagged union would.

Running everything through one function seems potentially dubious, since in the case where the function is not inlined it's a lot of unnecessary branching.

I assume by "dubious" you mean "slower than an alternative". I can't say here, but I think we should measure if we decide to add more "operations", either via union(enum) or vtable.

This pattern means that in order to query the size of an allocation, you need to construct a 48 byte stack value in memory and then pass the address of that into a virtual function.

We need to pass these 48 bytes of memory somehow, we can't get around that. I can't say which mechanism is faster though. I will give you that with a tagged union, there will be some cases that hog more stack memory than they need to, but in others I'm guessing they wouldn't? Does a tagged union with a "small activated field" only take up as much stack memory as that field needs? Would be interested to know.

While this does solve the api problems, so does a vtable, and I think the vtable approach is probably better/faster/easier to optimize.

You could be right here and I think you have more insight on this than myself. I think the approach of breaking up our 2 operations into smaller ones has merit, whether it's better to use a vtable or a tagged union, I'm not sure.

lib/std/mem/Allocator.zig

andrewrk · 2021-11-30T22:01:31Z

@leecannon I'd like to merge this without a6246c8. I think we should change one thing at a time and now we have the luxury of looking at how the graphs on https://ziglang.org/perf/ will change with each thing.

Can we do the interface change alone?

leecannon · 2021-11-30T23:45:22Z

Okay I have pulled the shrink change out and will make a new PR for it once this is merged.

And just to note the shrink change is a correctness issue not a performance improvement, which is even more reason to split it out.

andrewrk

Great work. Thank you for taking on such a monumental task.

Ready to merge as soon as we have the green light.

See ziglang/zig/pull/10055

andrewrk · 2021-12-01T05:35:11Z

Looking at https://ziglang.org/perf/ - I see major regression with maxrss on a few benchmarks after merging this. For example, the arena-allocator benchmark went from using 4 MiB to 11 GiB.

SpexGuy · 2021-12-01T05:47:24Z

Holy crap! There must be a broken free somewhere, the increased size of Allocator can only have a maximum of 2x increase in memory use if every piece of memory in the program is an Allocator.

Allocators are now passed by value instead of reference. See ziglang/zig#10055 Closes Vexu#163

Allocators are now passed by value instead of reference. See ziglang/zig#10055 Closes #163

See ziglang/zig/pull/10055

* Update to new Allocator API See ziglang/zig#10055 Co-authored-by: Komari Spaghetti <[email protected]> Co-authored-by: Nameless <[email protected]>

ominitay reviewed Nov 9, 2021

View reviewed changes

lib/std/mem/Allocator.zig Outdated Show resolved Hide resolved

marler8997 reviewed Nov 19, 2021

View reviewed changes

lib/std/mem/Allocator.zig Show resolved Hide resolved

leecannon added 9 commits November 30, 2021 23:32

allocgate: std Allocator interface refactor

85de022

allocgate: dont use a dummy temporary for stateless allocators

47bc13b

allocgate: stage 1 and 2 building

75548b5

allocgate: renamed getAllocator function to allocator

1093b09

allocgate: fix failing tests

80bbf23

allocgate: utilize a *const vtable field

9377f32

allocgate: apply missed changes

02e5e0b

allocgate: update code to use new interface

23866b1

allocgate: split free out from resize

f68cda7

allocgate: change resize to return optional instead of error

066eaa5

andrewrk approved these changes Dec 1, 2021

View reviewed changes

andrewrk merged commit 7355a20 into ziglang:master Dec 1, 2021

leecannon deleted the allocator_refactor branch December 1, 2021 02:55

Snektron added a commit to Snektron/vulkan-zig that referenced this pull request Dec 1, 2021

Adapt to latest Zig master

1dfd87e

See ziglang/zig/pull/10055

mattbork mentioned this pull request Dec 1, 2021

gpa: fix leak in freeLarge and memory limit accounting in resize and resizeLarge #10258

Merged

SuperAuguste mentioned this pull request Dec 1, 2021

Build fails with expected type '*std.mem.Allocator', found 'std.mem.Allocator' zigtools/zls#429

Closed

ehaas added a commit to ehaas/arocc that referenced this pull request Dec 4, 2021

Zig stdlib: update Allocator interface

29ca6dc

Allocators are now passed by value instead of reference. See ziglang/zig#10055 Closes Vexu#163

ehaas mentioned this pull request Dec 4, 2021

Zig stdlib: update Allocator interface Vexu/arocc#164

Merged

ehaas added a commit to ehaas/arocc that referenced this pull request Dec 4, 2021

Zig stdlib: update Allocator interface

4a0c785

Allocators are now passed by value instead of reference. See ziglang/zig#10055 Closes Vexu#163

Vexu pushed a commit to Vexu/arocc that referenced this pull request Dec 6, 2021

Zig stdlib: update Allocator interface

4e58444

Allocators are now passed by value instead of reference. See ziglang/zig#10055 Closes #163

BratishkaErik mentioned this pull request Dec 6, 2021

glfw: update to zig latest master branch hexops/mach#116

Closed

1 task

Snektron added a commit to Snektron/vulkan-zig that referenced this pull request Dec 8, 2021

Adapt to latest Zig master

797ae8a

See ziglang/zig/pull/10055

anand-bala mentioned this pull request Dec 15, 2021

Update to new Allocator API Hejsil/mecha#32

Merged

Hejsil added a commit to Hejsil/mecha that referenced this pull request Dec 16, 2021

Update to new Allocator API (#32)

79c5596

* Update to new Allocator API See ziglang/zig#10055 Co-authored-by: Komari Spaghetti <[email protected]> Co-authored-by: Nameless <[email protected]>

Vexu mentioned this pull request Dec 7, 2022

Draft: Have Reader/Writer be a runtime instead of comptime interface #13808

Closed

andrewrk mentioned this pull request Jul 4, 2025

Writergate #24329

Merged

12 tasks

Uh oh!

Allocgate #10055

Allocgate #10055

Conversation

leecannon commented Oct 29, 2021

Uh oh!

daurnimator commented Oct 29, 2021

Uh oh!

andrewrk commented Oct 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leecannon commented Oct 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

FixedBufferAllocator

GeneralPurposeAllocator

Uh oh!

andrewrk commented Oct 29, 2021

Uh oh!

leecannon commented Oct 29, 2021

Uh oh!

ominitay commented Oct 29, 2021

Uh oh!

ominitay commented Oct 29, 2021

Uh oh!

ominitay commented Oct 29, 2021

Uh oh!

jayschwa commented Oct 29, 2021

Uh oh!

leecannon commented Oct 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jarred-Sumner commented Oct 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ikskuh commented Oct 31, 2021

Uh oh!

Jarred-Sumner commented Oct 31, 2021 via email

Uh oh!

ifreund commented Nov 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marler8997 commented Nov 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marler8997 commented Nov 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SpexGuy commented Nov 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marler8997 commented Nov 3, 2021

Uh oh!

Uh oh!

Uh oh!

andrewrk commented Nov 30, 2021

Uh oh!

leecannon commented Nov 30, 2021

Uh oh!

andrewrk left a comment

Choose a reason for hiding this comment

Uh oh!

andrewrk commented Dec 1, 2021

Uh oh!

SpexGuy commented Dec 1, 2021

Uh oh!

Uh oh!

andrewrk commented Oct 29, 2021 •

edited

Loading

leecannon commented Oct 29, 2021 •

edited

Loading

leecannon commented Oct 29, 2021 •

edited

Loading

Jarred-Sumner commented Oct 31, 2021 •

edited

Loading

ifreund commented Nov 1, 2021 •

edited

Loading

marler8997 commented Nov 3, 2021 •

edited

Loading

marler8997 commented Nov 3, 2021 •

edited

Loading

SpexGuy commented Nov 3, 2021 •

edited

Loading