-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Proposal
Problem statement
Expose the ability to prefetch data and instructions with the goal of optimizing CPU cache usage.
Motivating examples or use cases
We are looking at translating this piece of C code to rust:
Solution sketch
Expose the prefetch intrinsics in core::hint
:
enum Locality {
None = 0,
Locality1 = 1,
Locality2 = 2,
Locality3 = 3,
}
pub fn prefetch_read_data<T>(data: *const T, locality: Locality) {
// The match ensures that the locality argument is a constant value, as required.
// SAFETY: the prefetch intrinsics do not modify the behavior of the program. They cannot trap
// and do not produce a value. Hence it is safe to provide an arbitrary pointer.
unsafe {
match locality {
Locality::None => crate::intrinsics::prefetch_read_data(data, 0),
Locality::Locality1 => crate::intrinsics::prefetch_read_data(data, 1),
Locality::Locality2 => crate::intrinsics::prefetch_read_data(data, 2),
Locality::Locality3 => crate::intrinsics::prefetch_read_data(data, 3),
}
}
}
Similar for prefetch_write_data
and prefetch_read_instruction
. The core::intrinsics
module also defines prefetch_write_instruction
, but that operation does not really make sense, and as far as I can tell there is no system where it is anything but a no-op. Hence we skip that function right now, it can be added if a need for it ever does arise.
There is a choice to make about the exact semantics. Currently the rust intrinsic maps to https://llvm.org/docs/LangRef.html#llvm-prefetch-intrinsic, which prefetches the given pointer and its surrounding cache line.
In rust we might instead want to prefetch a whole value of type T
, by incrementing and prefetching the pointer by the cache line width until a full size_of::<T>
bytes have been prefetched.
Alternatively, perhaps the pointer type should be restricted to e.g. *const u8
, and writing the loop can be left to the user.
Alternatives
There is lots to bikeshed here. For instance, maybe the enum should instead have more descriptive variant names, e.g.:
enum Locality {
NoReuse,
LowReuse,
ModerateReuse,
HighReuse,
}
We could use more enums (for read/write and data/instruction) to reduce the number of functions. I personally like having the functions though.
The default alternative is to continue to use inline assembly to emit the correct prefetching instruction. However, this is error-prone and platform-specific.
Links and related work
We currently have some support for prefetching in stdarch
- https://doc.rust-lang.org/core/arch/x86/fn._mm_prefetch.html
- https://doc.rust-lang.org/core/arch/aarch64/fn._prefetch.html (unstable Tracking Issue for AArch64 prefetch intrinsic rust#117217)
LLVM provides llvm.prefetch
(see https://llvm.org/docs/LangRef.html#llvm-prefetch-intrinsic), with an API that maps well to at least the x86 and aarch64 primitives. Hence, other codegen backends should be able to also provide this functionality.
Clang provides __builtin_prefetch
that exposes the LLVM intrinsic, e.g. __builtin_prefetch(&b[i + PDIST], /*rw=*/1, /*locality=*/3);
. GCC also supports provides this builtin.
https://clang.llvm.org/docs/LanguageExtensions.html#builtin-prefetch
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
- We think this problem seems worth solving, and the standard library might be the right place to solve it.
- We think that this probably doesn't belong in the standard library.
Second, if there's a concrete solution:
- We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
- We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.