Detours implementation of overriding CRT #775

mjp41 · 2025-06-27T15:54:01Z

This commit adds an override to Windows for replacing the CRT malloc/free etc routines.

~~This is stacked on top of #774.~~

mjp41 · 2025-06-30T13:59:28Z

@NeilMonday I wonder if you had any thoughts on this PR?

This commit adds an override to Windows for replacing the CRT malloc/free etc routines.

SchrodingerZhu · 2025-07-01T14:13:09Z

I wonder if we should adjust the behavior of malloc to interact with the alloc failure hooks (aka new handlers) on windows. Otherwise, after detour is installed, new handlers may not be invoked as excepted.

The UCRT way of doing this:

// This function implements the logic of malloc().  It is called directly by the
// malloc() function in the Release CRT and is called by the debug heap in the
// Debug CRT.
//
// This function must be marked noinline, otherwise malloc and
// _malloc_base will have identical COMDATs, and the linker will fold
// them when calling one from the CRT. This is necessary because malloc
// needs to support users patching in custom implementations.
extern "C" __declspec(noinline) _CRTRESTRICT void* __cdecl _malloc_base(size_t const size)
{
    // Ensure that the requested size is not too large:
    _VALIDATE_RETURN_NOEXC(_HEAP_MAXREQ >= size, ENOMEM, nullptr);

    // Ensure we request an allocation of at least one byte:
    size_t const actual_size = size == 0 ? 1 : size;

    for (;;)
    {
        void* const block = HeapAlloc(__acrt_heap, 0, actual_size);
        if (block)
            return block;

        // Otherwise, see if we need to call the new handler, and if so call it.
        // If the new handler fails, just return nullptr:
        if (_query_new_mode() == 0 || !_callnewh(actual_size))
        {
            errno = ENOMEM;
            return nullptr;
        }

        // The new handler was successful; try to allocate again...
    }
}

mjp41 · 2025-07-01T19:08:33Z

That is a great observation. I have a few thoughts:

I am not happy to put it at the top level. That will cause a branch on the fast path, which is really bad. And I think will make the codegen much worse as it will stop it being tail calls everywhere.
We need to determine if we were called by malloc or new as they have different behaviours, this probably means we should template in the failure call, so we can codegen new and malloc differently

There are currently two places where failure turns into a nullptr:

snmalloc/src/snmalloc/mem/corealloc.h

Lines 804 to 807 in 012138e

    
           if (slab == nullptr) 
        
           { 
        
             return nullptr; 
        
           }

snmalloc/src/snmalloc/mem/corealloc.h

Lines 668 to 697 in 012138e

    
                         auto [chunk, meta] = Config::Backend::alloc_chunk( 
        
                           self->get_backend_local_state(), 
        
                           large_size_to_chunk_size(size), 
        
                           PagemapEntry::encode( 
        
                             self->public_state(), size_to_sizeclass_full(size)), 
        
                           size_to_sizeclass_full(size)); 
        
           #ifdef SNMALLOC_TRACING 
        
                         message<1024>( 
        
                           "size {} pow2size {}", size, bits::next_pow2_bits(size)); 
        
           #endif 
        
                         // set up meta data so sizeclass is correct, and hence alloc size, 
        
                         // and external pointer. Initialise meta data for a successful 
        
                         // large allocation. 
        
                         if (meta != nullptr) 
        
                         { 
        
                           meta->initialise_large( 
        
                             address_cast(chunk), freelist::Object::key_root); 
        
                           self->laden.insert(meta); 
        
                         } 
        
                         if (zero_mem == YesZero && chunk.unsafe_ptr() != nullptr) 
        
                         { 
        
                           Config::Pal::template zero<false>( 
        
                             chunk.unsafe_ptr(), bits::next_pow2(size)); 
        
                         } 
        
                         return capptr_chunk_is_alloc( 
        
                           capptr_to_user_address_control(chunk));

The second can return nullptr, but it is less obvious. These correspond to failing to allocate a large and small object respectively.
I think if we can thread a template to these points that by default sets errno and returns nullptr. And in the case of the Windows Detours version does the check of the set_new_handler stuff.

I think with these changes we will be able to not impact performance, and meet the spec.

Does that make sense to you @SchrodingerZhu?

SchrodingerZhu · 2025-07-02T03:29:41Z

The plan sounds solid to me. Just a minor comment:

We need to determine if we were called by malloc or new as they have different behaviours

I think UCRT is calling the new handler regardless of whether the toplevel function is using C-API or C++-API. As demonstrated in the code comment of UCRT _malloc_base function, this function is exactly the implementation of malloc, so malloc also has the behavior.

mjp41 · 2025-07-02T05:49:59Z

I thought it would only call it in the case of malloc if _query_new_mode() was set? But would always call the handler for new?

We also really need to implement _recalloc, but that requires accurate object sizes:

char* a = (char*)malloc(23);
char* b = _recalloc(a, 32);
assert(b[26] == 0);

This is hard to achieve with a sizeclass allocator as we don't have the accurate size information around.

So I would like to leave _recalloc to a future PR. We could use the custom Meta data feature to implement accurate sizes.

SchrodingerZhu · 2025-07-03T16:42:28Z

I thought it would only call it in the case of malloc if _query_new_mode() was set? But would always call the handler for new?

I see. it makes sense then.

mjp41 · 2025-07-15T09:01:23Z

#791 means this can now be extended to correctly cover the Windows CRT _set_new_handler behaviour. So moving to draft until that is done.

mjp41 force-pushed the detours branch 4 times, most recently from 7e678d6 to c64e2a1 Compare June 30, 2025 13:37

mjp41 requested a review from SchrodingerZhu June 30, 2025 14:14

mjp41 force-pushed the detours branch 2 times, most recently from 671efbc to 68d4180 Compare July 1, 2025 10:04

This was referenced Jul 1, 2025

Windows: how to override malloc / free / new / delete globally ? #700

Open

Maybe missing _base variants in overrides #638

Closed

Detours implementation of overriding CRT

148c8c0

This commit adds an override to Windows for replacing the CRT malloc/free etc routines.

mjp41 force-pushed the detours branch from 68d4180 to 148c8c0 Compare July 1, 2025 13:43

mjp41 mentioned this pull request Jul 2, 2025

Implement basic Detours Functionality to replace malloc/free/etc with symbols from snmalloc. #783

Open

mjp41 marked this pull request as draft July 15, 2025 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Detours implementation of overriding CRT #775

Detours implementation of overriding CRT #775

Uh oh!

mjp41 commented Jun 27, 2025 •

edited

Loading

Uh oh!

mjp41 commented Jun 30, 2025

Uh oh!

SchrodingerZhu commented Jul 1, 2025 •

edited

Loading

Uh oh!

mjp41 commented Jul 1, 2025

Uh oh!

SchrodingerZhu commented Jul 2, 2025 •

edited

Loading

Uh oh!

mjp41 commented Jul 2, 2025

Uh oh!

SchrodingerZhu commented Jul 3, 2025

Uh oh!

mjp41 commented Jul 15, 2025

Uh oh!

Uh oh!

Detours implementation of overriding CRT #775

Are you sure you want to change the base?

Detours implementation of overriding CRT #775

Uh oh!

Conversation

mjp41 commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjp41 commented Jun 30, 2025

Uh oh!

SchrodingerZhu commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjp41 commented Jul 1, 2025

Uh oh!

SchrodingerZhu commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjp41 commented Jul 2, 2025

Uh oh!

SchrodingerZhu commented Jul 3, 2025

Uh oh!

mjp41 commented Jul 15, 2025

Uh oh!

Uh oh!

mjp41 commented Jun 27, 2025 •

edited

Loading

SchrodingerZhu commented Jul 1, 2025 •

edited

Loading

SchrodingerZhu commented Jul 2, 2025 •

edited

Loading