-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Description
Despite its age, Doug Lea's malloc
implementation (dlmalloc) still remains near the state of the art for embedded single-threaded or lightly multi-threaded allocators. While there has been much improvement in the space of general system malloc
s since dlmalloc was SOTA, much of that work owes to Hoard, which shlfted design focus towards cache locality, limiting contention, and page friendliness. These concerns typically apply much less (or not at all) in the embedded space. There has been a fair amount of work in the embedded space too, but much of it centers around improving dlmalloc (e.g. TLSF). dlmalloc is also the main implementation in newlib and picolibc, so llvm-libc's malloc
is likely to be directly compared against it.
Accordingly, this issue catalogs the characteristics of the current llvm-libc that compare unfavorably to dlmalloc. Many of these can be corrected without too much trouble; others might require some light algorithms work. It's also likely possible to build something smaller than dlmalloc that would better suite this space; after all, it wasn't originally designed for it.
All of the following assumes a 32-bit system. Take all the following with a huge grain of salt. I've only read the comments and implemented something similar to dlmalloc on the side. I haven't actually measured dlmalloc or llvm-libc's current allocator.
Space
- Minimum allocation size
- dlmalloc's minimum effective allocation size is 16 bytes, which is the minimum possible for an 8-byte aligned
malloc
with any overhead. - It's a bit hard to read out of the implementation, but it appears that the current allocator has 20 bytes of overhead per free chunk, making the minimum size at least 20 bytes. More may be required for alignment, but it's difficult to tell what the minimum or average cases would be.
- dlmalloc's minimum effective allocation size is 16 bytes, which is the minimum possible for an 8-byte aligned
- Used block efficiency
- dlmalloc's used blocks contain only a single 4 byte word as overhead. Accordingly, the minimum 16-byte chunks size can fit a 12-byte allocation.
- The current allocator has used blocks pay all but 8 bytes of the overhead of a free block. So a minimum-sized 20-byte chunk could only fit an 8 byte allocation.
Time
All dlmalloc operations finish in "constant time". This is only true in the same sense that it's true for every logarithmic time operation on a data structure stored in memory; dlmalloc uses a clever binary trie to organize its free lists.
The current allocator scans its free lists in time linear to the number of free blocks in that size class. Since the free lists are singly linked, it also does a scans when chunks are coalesced during free
, which could otherwise be done in constant time.
Initialization cost
Embedded dlmalloc typically initializes the heap lazily. Since the heap doesn't need to be zeroed, the initialization simply establishes boundary tags for the free area, which takes constant time.
llvm-libc's malloc places the heap in .bss
, which may require it to be unnecessarily zeroed on program startup. This is linear in the size of the heap.