Skip to content

Detours implementation of overriding CRT #775

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mjp41
Copy link
Member

@mjp41 mjp41 commented Jun 27, 2025

This commit adds an override to Windows for replacing the CRT malloc/free etc routines.

This is stacked on top of #774.

@mjp41 mjp41 force-pushed the detours branch 4 times, most recently from 7e678d6 to c64e2a1 Compare June 30, 2025 13:37
@mjp41
Copy link
Member Author

mjp41 commented Jun 30, 2025

@NeilMonday I wonder if you had any thoughts on this PR?

This commit adds an override to Windows for replacing the CRT malloc/free etc routines.
@SchrodingerZhu
Copy link
Collaborator

SchrodingerZhu commented Jul 1, 2025

I wonder if we should adjust the behavior of malloc to interact with the alloc failure hooks (aka new handlers) on windows. Otherwise, after detour is installed, new handlers may not be invoked as excepted.

The UCRT way of doing this:

// This function implements the logic of malloc().  It is called directly by the
// malloc() function in the Release CRT and is called by the debug heap in the
// Debug CRT.
//
// This function must be marked noinline, otherwise malloc and
// _malloc_base will have identical COMDATs, and the linker will fold
// them when calling one from the CRT. This is necessary because malloc
// needs to support users patching in custom implementations.
extern "C" __declspec(noinline) _CRTRESTRICT void* __cdecl _malloc_base(size_t const size)
{
    // Ensure that the requested size is not too large:
    _VALIDATE_RETURN_NOEXC(_HEAP_MAXREQ >= size, ENOMEM, nullptr);

    // Ensure we request an allocation of at least one byte:
    size_t const actual_size = size == 0 ? 1 : size;

    for (;;)
    {
        void* const block = HeapAlloc(__acrt_heap, 0, actual_size);
        if (block)
            return block;

        // Otherwise, see if we need to call the new handler, and if so call it.
        // If the new handler fails, just return nullptr:
        if (_query_new_mode() == 0 || !_callnewh(actual_size))
        {
            errno = ENOMEM;
            return nullptr;
        }

        // The new handler was successful; try to allocate again...
    }
}

@mjp41
Copy link
Member Author

mjp41 commented Jul 1, 2025

That is a great observation. I have a few thoughts:

  1. I am not happy to put it at the top level. That will cause a branch on the fast path, which is really bad. And I think will make the codegen much worse as it will stop it being tail calls everywhere.
  2. We need to determine if we were called by malloc or new as they have different behaviours, this probably means we should template in the failure call, so we can codegen new and malloc differently
  3. There are currently two places where failure turns into a nullptr:
    • if (slab == nullptr)
      {
      return nullptr;
      }
    • auto [chunk, meta] = Config::Backend::alloc_chunk(
      self->get_backend_local_state(),
      large_size_to_chunk_size(size),
      PagemapEntry::encode(
      self->public_state(), size_to_sizeclass_full(size)),
      size_to_sizeclass_full(size));
      #ifdef SNMALLOC_TRACING
      message<1024>(
      "size {} pow2size {}", size, bits::next_pow2_bits(size));
      #endif
      // set up meta data so sizeclass is correct, and hence alloc size,
      // and external pointer. Initialise meta data for a successful
      // large allocation.
      if (meta != nullptr)
      {
      meta->initialise_large(
      address_cast(chunk), freelist::Object::key_root);
      self->laden.insert(meta);
      }
      if (zero_mem == YesZero && chunk.unsafe_ptr() != nullptr)
      {
      Config::Pal::template zero<false>(
      chunk.unsafe_ptr(), bits::next_pow2(size));
      }
      return capptr_chunk_is_alloc(
      capptr_to_user_address_control(chunk));

      The second can return nullptr, but it is less obvious. These correspond to failing to allocate a large and small object respectively.
      I think if we can thread a template to these points that by default sets errno and returns nullptr. And in the case of the Windows Detours version does the check of the set_new_handler stuff.

I think with these changes we will be able to not impact performance, and meet the spec.

Does that make sense to you @SchrodingerZhu?

@SchrodingerZhu
Copy link
Collaborator

SchrodingerZhu commented Jul 2, 2025

The plan sounds solid to me. Just a minor comment:

We need to determine if we were called by malloc or new as they have different behaviours

I think UCRT is calling the new handler regardless of whether the toplevel function is using C-API or C++-API. As demonstrated in the code comment of UCRT _malloc_base function, this function is exactly the implementation of malloc, so malloc also has the behavior.

@mjp41
Copy link
Member Author

mjp41 commented Jul 2, 2025

I thought it would only call it in the case of malloc if _query_new_mode() was set? But would always call the handler for new?

We also really need to implement _recalloc, but that requires accurate object sizes:

char* a = (char*)malloc(23);
char* b = _recalloc(a, 32);
assert(b[26] == 0);

This is hard to achieve with a sizeclass allocator as we don't have the accurate size information around.

So I would like to leave _recalloc to a future PR. We could use the custom Meta data feature to implement accurate sizes.

@SchrodingerZhu
Copy link
Collaborator

I thought it would only call it in the case of malloc if _query_new_mode() was set? But would always call the handler for new?

I see. it makes sense then.

@mjp41
Copy link
Member Author

mjp41 commented Jul 15, 2025

#791 means this can now be extended to correctly cover the Windows CRT _set_new_handler behaviour. So moving to draft until that is done.

@mjp41 mjp41 marked this pull request as draft July 15, 2025 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants