Skip to content

Add memory prefetching to core::hint #638

@folkertdev

Description

@folkertdev

Proposal

Problem statement

Expose the ability to prefetch data and instructions with the goal of optimizing CPU cache usage.

Motivating examples or use cases

We are looking at translating this piece of C code to rust:

https://github.com/facebook/zstd/blob/cfeb29e39713dadcb5f6735a129289ac06b3de73/lib/common/compiler.h#L134-L154

Solution sketch

Expose the prefetch intrinsics in core::hint:

enum Locality {
    None = 0,
    Locality1 = 1,
    Locality2 = 2,
    Locality3 = 3,
}

pub fn prefetch_read_data<T>(data: *const T, locality: Locality) {
    // The match ensures that the locality argument is a constant value, as required.

    // SAFETY: the prefetch intrinsics do not modify the behavior of the program. They cannot trap
    // and do not produce a value. Hence it is safe to provide an arbitrary pointer.
    unsafe {
        match locality {
            Locality::None => crate::intrinsics::prefetch_read_data(data, 0),
            Locality::Locality1 => crate::intrinsics::prefetch_read_data(data, 1),
            Locality::Locality2 => crate::intrinsics::prefetch_read_data(data, 2),
            Locality::Locality3 => crate::intrinsics::prefetch_read_data(data, 3),
        }
    }
}

Similar for prefetch_write_data and prefetch_read_instruction. The core::intrinsics module also defines prefetch_write_instruction, but that operation does not really make sense, and as far as I can tell there is no system where it is anything but a no-op. Hence we skip that function right now, it can be added if a need for it ever does arise.

There is a choice to make about the exact semantics. Currently the rust intrinsic maps to https://llvm.org/docs/LangRef.html#llvm-prefetch-intrinsic, which prefetches the given pointer and its surrounding cache line.

In rust we might instead want to prefetch a whole value of type T, by incrementing and prefetching the pointer by the cache line width until a full size_of::<T> bytes have been prefetched.

Alternatively, perhaps the pointer type should be restricted to e.g. *const u8, and writing the loop can be left to the user.

Alternatives

There is lots to bikeshed here. For instance, maybe the enum should instead have more descriptive variant names, e.g.:

enum Locality { 
   NoReuse,
   LowReuse,
   ModerateReuse,
   HighReuse,
}

We could use more enums (for read/write and data/instruction) to reduce the number of functions. I personally like having the functions though.

The default alternative is to continue to use inline assembly to emit the correct prefetching instruction. However, this is error-prone and platform-specific.

Links and related work

We currently have some support for prefetching in stdarch

LLVM provides llvm.prefetch (see https://llvm.org/docs/LangRef.html#llvm-prefetch-intrinsic), with an API that maps well to at least the x86 and aarch64 primitives. Hence, other codegen backends should be able to also provide this functionality.

Clang provides __builtin_prefetch that exposes the LLVM intrinsic, e.g. __builtin_prefetch(&b[i + PDIST], /*rw=*/1, /*locality=*/3);. GCC also supports provides this builtin.

https://clang.llvm.org/docs/LanguageExtensions.html#builtin-prefetch

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    T-libs-apiapi-change-proposalA proposal to add or alter unstable APIs in the standard libraries

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions