Skip to content

Conversation

Lunderberg
Copy link
Contributor

If the Relax VM attempts to allocate more memory than is available on the GPU, it should raise an exception. Prior to this commit, an out-of-memory exception instead triggered a segfault within "vm.builtin.alloc_storage".

When an allocation succeeds, the sequence of events is:

  1. A StorageObj instance is constructed.
  2. A call is made to alloc->Alloc, which returns the allocated buffer.
  3. The allocated buffer is assigned to StorageObj::buffer.
  4. The allocator is assigned to StorageObj::allocator.

However, when the GPU has insufficient memory, the sequence instead is:

  1. A StorageObj instance is constructed.
  2. A call is made to alloc->Alloc, which raises an out-of-memory exception.
  3. In unwinding the stack, the StorageObj destructor is called.
  4. The StorageObj destructor calls allocator->Free(buffer). Since neither allocator nor buffer have been defined, this causes a segfault.

This commit implements two independent fixes for this bug.

First, the "vm.builtin.alloc_storage" function is reordered to call alloc->Alloc(...) before constructing the StorageObj instance. If an exception is raised during the allocation, there is no StorageObj instance whose destructor must be called.

Second, the StorageObj::allocator field is initialized to nullptr by default, and the destructor only calls allocator->Free if the allocator is non-null. This prevents a similar error from occurring at any other callsites that directly construct a StorageObj.

If the Relax VM attempts to allocate more memory than is available on
the GPU, it should raise an exception.  Prior to this commit, an
out-of-memory exception instead triggered a segfault within
`"vm.builtin.alloc_storage"`.

When an allocation succeeds, the sequence of events is:

1. A `StorageObj` instance is constructed.
2. A call is made to `alloc->Alloc`, which returns the allocated buffer.
3. The allocated buffer is assigned to `StorageObj::buffer`.
4. The allocator is assigned to `StorageObj::allocator`.

However, when the GPU has insufficient memory, the sequence instead is:

1. A `StorageObj` instance is constructed.
2. A call is made to `alloc->Alloc`, which raises an out-of-memory exception.
3. In unwinding the stack, the `StorageObj` destructor is called.
4. The `StorageObj` destructor calls `allocator->Free(buffer)`.  Since
   neither `allocator` nor `buffer` have been defined, this causes a segfault.

This commit implements two independent fixes for this bug.

First, the `"vm.builtin.alloc_storage"` function is reordered to call
`alloc->Alloc(...)` before constructing the `StorageObj` instance.  If
an exception is raised during the allocation, there is no `StorageObj`
instance whose destructor must be called.

Second, the `StorageObj::allocator` field is initialized to `nullptr`
by default, and the destructor only calls `allocator->Free` if the
`allocator` is non-null.  This prevents a similar error from occurring
at any other callsites that directly construct a `StorageObj`.
@Lunderberg Lunderberg merged commit 6afbc12 into apache:main Apr 19, 2024
@Lunderberg Lunderberg deleted the bugfix_segfault_when_relax_vm_oom branch April 19, 2024 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants