Skip to content

[API Proposal]: RuntimeHelpers.GetAssemblyFromRvaSpan API #71268

Closed as not planned
@Sergio0694

Description

@Sergio0694

Background and motivation

This is a follow up from a conversation with @GrabYourPitchforks on Discord around UTF8 strings, and aims to mitigate the pack of that type as well as extending support for passing readonly PE data around on the heap as a more general feature.

C# has support for easily declaring embedded PE data, via ReadOnlySpan<byte> values. For instance:

ReadOnlySpan<byte> span1 = new byte[] { 1, 2, 3, 4 };
ReadOnlySpan<byte> span2 = "Hello world"u8;

Both result in two spans wrapping embedded PE data. There is a big limitation with this though, in that there's no safe way to then pass this data around over the heap without incurring additional allocations. You can certainly create a custom MemoryManager<byte> and then use Unsafe.AsPointer(ref MemoryMarshal.GetReference(span)) to wrap the pointer to the data, which is fixed, but this doesn't account for assembly unloading, as the fact that underlying RVA field is being referenced will not be seen by the GC. This is especially problematic due to not having an Utf8String type (which will likely not come at all at this point), but it applies to all embedded binary blobs in general. The second you want to pass them around on the heap, you need to just call ToArray() on them and allocate, so you can then pass them as a ReadOnlyMemory<byte>, or you need to give up on safety and just YOLO it with the workaround mentioned above. Neither of these is ideal at all.

We should have a fast, safe, trimming-friendly way to pass embedded PE data around on the heap.

cc. @AaronRobinsonMSFT @tannergooding

API Proposal

The idea is to solve this by adding the following intrinsic API:

namespace System.Runtime.CompilerServices;

public static class RuntimeHelpers
{
    public static Assembly GetAssemblyForRvaSpan(ReadOnlySpan<byte> span);
}

This API would resolve the assembly from a span generated from an RVA field, and return it. If the span does not wrap an RVA field, it can just throw. This way, developers would be able to just pass around arbitrary types wrapping both the assembly and the raw pointer to the span data, allowing them to not allocate anything while still making the code safe for assembly unloading.

This could be useful in all sorts of cases where an Utf8String could've been used, and where arrays/memorys are instead used today after calling ToArray() on UTF8 literals, which is not ideal. Other cases where people might want to pass around embedded binary blobs could leverage this too. For instance, I'm currently using the hack mentioned above in ComputeSharp.D2D1, but I'd really like to just switch to a safe approach here instead without having to give up on keeping that code path allocation free.

API Usage

static ReadOnlyMemory<byte> GetSomeTextAsUtf8Memory()
{
    ReadOnlySpan<byte> data = "Hello world"u8;
    Assembly assembly = RuntimeHelpers.GetAssemblyForRvaSpan(data);
    void* ptr = Unsafe.AsPointer(ref MemoryMarshal.GetReference(data));

    return new RvaSpanMemoryManager(ptr, data.Length, assembly).Memory;
}

Same thing for any other arbitrary embedded binary blob.

Alternative Designs

Not sure if a RuntimeFieldHandle also keeps the source assembly alive. If that's the case we might decide to have an intrinsic to get that from an RVA span, though it might be less useful than having an Assembly object. Not sure which would be easier to implement. I should note, this proposal is not necessarily with the best possible API shape, it's meant to also start a discussion on getting some way to do this though, because this being not doable at all is just really really unfortunate.

Open questions

C# 11 and .NET 7 introduced the ability to also have constant data for primitive types that can account for endianness of a system. This means that in some cases, the returned span might not directly wrap an RVA field. Do we want to support those too somehow, or is the current proposal to only support byte spans enough? We might also just decide to extend support to additional types in the future, so the two don't necessarily have to be done at the same time.

Risks

None that I can see.
Currently developers are forced to either use unsafe code or to waste allocations, so can't be worse than that.
The API would be hidden away in compiler services, so it's not meant for general use anyway.

Metadata

Metadata

Assignees

No one assigned

    Labels

    api-suggestionEarly API idea and discussion, it is NOT ready for implementationarea-System.Memory

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions