Skip to content

Open Enclave memory fragmentation #615

@mjp41

Description

@mjp41

An OE application is seeing considerably more memory usage than when using dlmalloc (the OE default). The memory appears to grow quickly and causes crashes.

Here are some properties of the application

  • 64MiB enclave size
  • 16 worker threads + a few other threads
  • Heavy ECall usage so threads are rapidly entering and exiting the enclave.

The current conjecture is that the per thread caching of reservations and fragmentation in them is causing too much memory usage. Although, it looks like it could be a memory leak that is unlikely.

Mitigations

To try to diagnose the issue, we have

  • Created detailed logging of per size class usage. (PR still required to integrate with main)
  • Used 4K chunk size to reduce the possible fragmentation
  • Disabled the per thread buddy allocators for large allocations
  • Reduced maximum size for use in thread local free lists to chunk size. (There were a small number per thread of 8KiB, 16KiB and 32KiB buffers that were causing long free lists to be constructed.

Chunks 4 KiB, no per thread buddy allocator.

With this design of snmalloc we have got the system to a steady state:

image

I left this running for about the same length of time ~4000 at the max did not move above the max on the graph.

At one of the higher points, we observe:

Requests for memory totalling: 25336960
Reservations of memory totalling: 31961088

This gives a 21% fragmentation due to chunks of memory not being fully used.

At one of the lower points, we observe

Requests for memory totalling: 15862784
Reservations of memory totalling: 22618112

This gives a 30% fragmentation.

Chunks 16 KiB, no per thread buddy allocator.

If we move to 16KiB chunks, then we see a higher memory usage due to increased fragmentation, but the overall pattern is almost identical:

image

Chunks 4 KiB, enable per thread buddy allocator.

This enables the per thread buddy allocator but with a maximum size of 256KiB instead of the 2MiB that is default in snmalloc. It is also using the 4 KiB chunks:

image

Here you can see that the memory appears to still be growing even after 1000 seconds. I believe this is due to the random nature in which the main worker threads, and the other threads are assigned allocators. The thread local caches only return memory in larger chunks, so this leads to more fragmentation. In larger memory scenarios this is not a problem, but for a 64MiB enclave with over 16 threads holding onto a couple of MiB per thread is a large percentage of the memory. There is approximately 25% overhead introduced by this per thread reservations, this is in addition to the 20% fragmentation from underused slabs.

I ran the benchmark for a weekend and this is the plot of the maximum memory:

image

The system takes a very long time to stabilise the growth in memory usage, but after around 8 hours seems to have most reached almost the maximum, and then after another 8 hours seems to stop growing. Though, leaving it another half day, and I observed an additional 256KiB growth in the max usage.

Short term solution

  • Build new configuration that disables thread local buddy allocators when the heap is small (Conditional range #617)
  • Confirm with team that this addresses memory growth in production
  • Build new release of snmalloc with these changes
  • Integrate changes into OE
  • Run performance testing on OE partners work loads to see perf effects of changes.

Alternative solutions

Disabling the per thread buddy allocators is going to harm performance in highly contended scenarios. An alternative solution is to build a concurrent buddy allocator that does not use a global lock to protect access. This is substantial work and further investigations should be made to understand the performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions