Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions compiler/rustc_abi/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1716,6 +1716,8 @@ pub struct AddressSpace(pub u32);
impl AddressSpace {
/// LLVM's `0` address space.
pub const ZERO: Self = AddressSpace(0);
/// The address space for shared memory on nvptx and amdgpu.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"shared" between whomst?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concept is called “shared memory”, so I meant it as a reference to that concept, not as an explanation ;)
I’ll make this more descriptive (though as at stands, we’ll first need to decide on nomenclature to use throughout the Rust codebase).

pub const SHARED: Self = AddressSpace(3);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have a name that makes it more clear that it is GPU-specific.

}

/// The way we represent values to the backend
Expand Down
23 changes: 23 additions & 0 deletions compiler/rustc_codegen_llvm/src/declare.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
use std::borrow::Borrow;

use itertools::Itertools;
use rustc_abi::AddressSpace;
use rustc_codegen_ssa::traits::TypeMembershipCodegenMethods;
use rustc_data_structures::fx::FxIndexSet;
use rustc_middle::ty::{Instance, Ty};
Expand Down Expand Up @@ -99,6 +100,28 @@ impl<'ll, CX: Borrow<SCx<'ll>>> GenericCx<'ll, CX> {
)
}
}

/// Declare a global value in a specific address space.
///
/// If there’s a value with the same name already declared, the function will
/// return its Value instead.
pub(crate) fn declare_global_in_addrspace(
&self,
name: &str,
ty: &'ll Type,
addr_space: AddressSpace,
) -> &'ll Value {
debug!("declare_global(name={name:?}, addrspace={addr_space:?})");
unsafe {
llvm::LLVMRustGetOrInsertGlobalInAddrspace(
(**self).borrow().llmod,
name.as_c_char_ptr(),
name.len(),
ty,
addr_space.0,
)
}
}
}

impl<'ll, 'tcx> CodegenCx<'ll, 'tcx> {
Expand Down
20 changes: 19 additions & 1 deletion compiler/rustc_codegen_llvm/src/intrinsic.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
use std::assert_matches::assert_matches;
use std::cmp::Ordering;

use rustc_abi::{Align, BackendRepr, ExternAbi, Float, HasDataLayout, Primitive, Size};
use rustc_abi::{
AddressSpace, Align, BackendRepr, ExternAbi, Float, HasDataLayout, Primitive, Size,
};
use rustc_codegen_ssa::base::{compare_simd_types, wants_msvc_seh, wants_wasm_eh};
use rustc_codegen_ssa::codegen_attrs::autodiff_attrs;
use rustc_codegen_ssa::common::{IntPredicate, TypeKind};
Expand Down Expand Up @@ -532,6 +534,22 @@ impl<'ll, 'tcx> IntrinsicCallBuilderMethods<'tcx> for Builder<'_, 'll, 'tcx> {
return Ok(());
}

sym::dynamic_shared_memory => {
let global = self.declare_global_in_addrspace(
"dynamic_shared_memory",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a special magic meaning to this hard-coded name for the global?

When linking together Rust code and code in other languages, is this the name everyone must use consistently to ensure things behave correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name has no meaning.
I didn’t try, but I’m rather confident to claim that all external addrspace(3) globals alias. There is no way to get two different dynamic shared memory pointers.
Using the name here and updating the global’s alignment is to generate readable and “clean” LLVM IR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m rather confident to claim that all external addrspace(3) globals alias

That seems very unusual, so if the intrinsic impl relies on this there should be a comment referring to the place where this is documented.

self.type_array(self.type_i8(), 0),
AddressSpace::SHARED,
);
Comment on lines +537 to +542
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I'm not sure this is the correct design for this, because it makes the calls to an intrinsic alter global program state during compilation, which feels very dicey and assumes that the calls to that intrinsic will experience codegen. I'm not sure how substantial this concern is, however, so I'm willing to be persuaded.

Copy link
Member

@RalfJung RalfJung Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point... if you call this function in dead code, it may or may not have any effect.

How do other languages handle controlling the alignment of this magic global?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair, even if it is called, it does not have any effect. It’s just a getter for a pointer that exists anyway. It does not change anything if it’s dead-code-eliminated.
So, to make it clear, calling the intrinsic does not alter any program state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're talking about the funky behavior where DCE'ing the intrinsic removes its compile-time effect on the alignment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I missed that. I think it would still do the correct thing as documented. All calls that are not eliminated get a pointer that’s sufficiently aligned for their type.
It is fine/implementation-defined to return a pointer that’s aligned more than what is requested.

How do other languages handle controlling the alignment of this magic global?

I think I can answer this now. You define an extern global of some type and the alignment of type is used. Unused extern globals can of course be removed/ignored. (This matters more when some of the globals are only used by some of the kernels defined in a program. A kernel only gets the alignment specified by globals it uses.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I guess I'm worrying about someone calling it with ::<u8>, casting the pointer, and then assuming that a call elsewhere (perhaps in a library they are depending on?) will enforce the alignment they want, but that call might get subject to DCE or other "non-compilation events".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be wrong, yeah, but it's an unsafe operation after all so it's not surprising that you can use it wrong.

Do you have another API in mind that would make this kind of mistake less likely?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sort of wondering if this should resemble thread-local statics in how it is declared, instead? So that program elements collaborating on accesses to this memory can reuse the same type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like that there's only a single dynamic groupshared allocation though that's aliased by everything in a group, which is not at all how statics work.

let ty::RawPtr(inner_ty, _) = result.layout.ty.kind() else { unreachable!() };
let alignment = self.align_of(*inner_ty).bytes() as u32;
unsafe {
if alignment > llvm::LLVMGetAlignment(global) {
llvm::LLVMSetAlignment(global, alignment);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do when there are multiple translation units? It seems like each would declare its own version of the global, potentially with a different alignment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of them return the same pointer.
I assume the linker ensures the maximum alignment is taken and the runtime will ensure that it’s honored.
(Though note that multiple translation units are currently not really supported in the backends, they both rely on IR linking.)

Copy link
Member

@RalfJung RalfJung Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of them return the same pointer.

By which magic does that happen? That's not usually how LLVM globals work, AFAIK.

I assume the linker ensures the maximum alignment is taken and the runtime will ensure that it’s honored.

That would be rather surprising to me; usually a global can only be declared in one translation unit and then must be imported in all the others. Your current implementation might end up with a separate symbol for each TU, I am not sure.

EDIT: Ah I didn't realize these are extern. The name LLVMRustGetOrInsertGlobal does not make that clear. I understand the name is pre-existing.

(Though note that multiple translation units are currently not really supported in the backends, they both rely on IR linking.)

How does that work in rustc where each crate is a separate TU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the key is that they are extern. I agree it would not make sense for non-extern globals.

How does that work in rustc where each crate is a separate TU?

We do (fat/full) LTO. All crates compile to bitcode, than that bitcode is linked and compiled as one LLVM IR module.
The Rust amdgpu backend uses the existing lto/linker-plugin-lto infrastructure, the Rust nvptx backend is declaring IR/bitcode as the output format and building the bitcode-linker, to link them together.

Copy link
Member

@RalfJung RalfJung Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually an alignment attribute on an extern global encodes an assumption -- it's a promise to LLVM that someone else has aligned the global sufficiently.

It looks like this here relies on some sort of magic handshake where whatever component actually creates the global will honor whatever alignment you can find in the bitcode?

That's pretty fragile -- usually in LLVM it's always correct to reduce the alignment attribute on such declarations, it just means we know less about what the real alignment is. (This is in contrast to alignment annotations on definitions which defines the real alignment, so it can be safely increased but not reduced.) Seems like some system here is (ab)using LLVM attributes in ways that break some of their core properties, or am I misunderstanding something?

Is this magic handshake documented anywhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as mentioned elsewhere in the PR) This is where some of that handshake is implemented.

self.cx().const_pointercast(global, self.type_ptr())
}

_ if name.as_str().starts_with("simd_") => {
// Unpack non-power-of-2 #[repr(packed, simd)] arguments.
// This gives them the expected layout of a regular #[repr(simd)] vector.
Expand Down
7 changes: 7 additions & 0 deletions compiler/rustc_codegen_llvm/src/llvm/ffi.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1907,6 +1907,13 @@ unsafe extern "C" {
NameLen: size_t,
T: &'a Type,
) -> &'a Value;
pub(crate) fn LLVMRustGetOrInsertGlobalInAddrspace<'a>(
M: &'a Module,
Name: *const c_char,
NameLen: size_t,
T: &'a Type,
AddressSpace: c_uint,
) -> &'a Value;
pub(crate) fn LLVMRustInsertPrivateGlobal<'a>(M: &'a Module, T: &'a Type) -> &'a Value;
pub(crate) fn LLVMRustGetNamedValue(
M: &Module,
Expand Down
1 change: 1 addition & 0 deletions compiler/rustc_codegen_ssa/src/mir/intrinsic.rs
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ impl<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>> FunctionCx<'a, 'tcx, Bx> {
sym::abort
| sym::unreachable
| sym::cold_path
| sym::dynamic_shared_memory
| sym::breakpoint
| sym::assert_zero_valid
| sym::assert_mem_uninitialized_valid
Expand Down
2 changes: 2 additions & 0 deletions compiler/rustc_hir_analysis/src/check/intrinsic.rs
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ fn intrinsic_operation_unsafety(tcx: TyCtxt<'_>, intrinsic_id: LocalDefId) -> hi
| sym::align_of
| sym::needs_drop
| sym::caller_location
| sym::dynamic_shared_memory
| sym::add_with_overflow
| sym::sub_with_overflow
| sym::mul_with_overflow
Expand Down Expand Up @@ -213,6 +214,7 @@ pub(crate) fn check_intrinsic_type(
}
sym::rustc_peek => (1, 0, vec![param(0)], param(0)),
sym::caller_location => (0, 0, vec![], tcx.caller_location_ty()),
sym::dynamic_shared_memory => (1, 0, vec![], Ty::new_mut_ptr(tcx, param(0))),
sym::assert_inhabited | sym::assert_zero_valid | sym::assert_mem_uninitialized_valid => {
(1, 0, vec![], tcx.types.unit)
}
Expand Down
21 changes: 16 additions & 5 deletions compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -209,10 +209,10 @@ extern "C" LLVMValueRef LLVMRustGetOrInsertFunction(LLVMModuleRef M,
.getCallee());
}

extern "C" LLVMValueRef LLVMRustGetOrInsertGlobal(LLVMModuleRef M,
const char *Name,
size_t NameLen,
LLVMTypeRef Ty) {
extern "C" LLVMValueRef
LLVMRustGetOrInsertGlobalInAddrspace(LLVMModuleRef M, const char *Name,
size_t NameLen, LLVMTypeRef Ty,
unsigned AddressSpace) {
Module *Mod = unwrap(M);
auto NameRef = StringRef(Name, NameLen);

Expand All @@ -223,10 +223,21 @@ extern "C" LLVMValueRef LLVMRustGetOrInsertGlobal(LLVMModuleRef M,
GlobalVariable *GV = Mod->getGlobalVariable(NameRef, true);
if (!GV)
GV = new GlobalVariable(*Mod, unwrap(Ty), false,
GlobalValue::ExternalLinkage, nullptr, NameRef);
GlobalValue::ExternalLinkage, nullptr, NameRef,
nullptr, GlobalValue::NotThreadLocal, AddressSpace);
return wrap(GV);
}

extern "C" LLVMValueRef LLVMRustGetOrInsertGlobal(LLVMModuleRef M,
const char *Name,
size_t NameLen,
LLVMTypeRef Ty) {
Module *Mod = unwrap(M);
unsigned AddressSpace = Mod->getDataLayout().getDefaultGlobalsAddressSpace();
return LLVMRustGetOrInsertGlobalInAddrspace(M, Name, NameLen, Ty,
AddressSpace);
}

extern "C" LLVMValueRef LLVMRustInsertPrivateGlobal(LLVMModuleRef M,
LLVMTypeRef Ty) {
return wrap(new GlobalVariable(*unwrap(M), unwrap(Ty), false,
Expand Down
1 change: 1 addition & 0 deletions compiler/rustc_span/src/symbol.rs
Original file line number Diff line number Diff line change
Expand Up @@ -903,6 +903,7 @@ symbols! {
dyn_star,
dyn_trait,
dynamic_no_pic: "dynamic-no-pic",
dynamic_shared_memory,
e,
edition_panic,
effective_target_features,
Expand Down
17 changes: 17 additions & 0 deletions library/core/src/intrinsics/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3238,6 +3238,23 @@ pub(crate) const fn miri_promise_symbolic_alignment(ptr: *const (), align: usize
)
}

/// Returns a pointer to dynamic shared memory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a short description, but it is just repeating the function signature:

pub fn dynamic_shared_memory<T: ?Sized>() -> *mut T;
/*     ^^^^^^^^^^^^^^^^^^^^               ^^ ^^^^^^
       |                                  |  |
       |                                  |  |
       -- "dynamic shared memory"         |  - "a pointer"
                                          - "returns"
*/

I'm basically repeating the same comment I already made about "shared with what?" except with adding that "dynamic" also is pretty vague since in computer programming it's always relative to some notion of "static". Yet the "static" is not obviously present in this description.

///
/// The returned pointer is the start of the dynamic shared memory region.
/// All pointers returned by `dynamic_shared_memory` point to the same address,
/// so alias the same memory.
Comment on lines +3244 to +3245
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that all dynamic_shared_memory::<T> for the same T return the same pointer, or do even dynamic_shared_memory::<T> and dynamic_shared_memory::<U> point to the same memory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of them alias, independent of the type.
It’s probably worth explaining the concept of shared memory in the comment?

Maybe this makes it clearer:

Returns a pointer to dynamic shared memory.

Shared memory is a memory region that is shared between all threads in
the same block/work-group. It is usually faster than global memory, which is
shared between all threads on a GPU.
Dynamic shared memory is in the shared memory region, though the allocated
size is specified late, when launching a gpu-kernel.

The pointer returned by dynamic_shared_memory() is the start of the dynamic
shared memory region. All calls to dynamic_shared_memory in a block/work-group,
independent of the generic type, return the same address, so alias the same memory.
The returned pointer is aligned by at least the alignment of T.

Other APIs

CUDA and HIP call this shared memory, shared between threads in a block.
OpenCL and SYCL call this local memory, shared between threads in a work-group.
GLSL calls this shared memory, shared between invocations in a work group.
DirectX calls this groupshared memory, shared between threads in a thread-group.

Copy link
Member

@RalfJung RalfJung Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s probably worth explaining the concept of shared memory in the comment?

It's probably worth using a more specific term ("GPU shared memory" or so) since many people reading this will think "shared memory" refers to its more standard meaning of memory shared across different processes (often set up via mmap). It's unfortunate that GPU vendors chose to overload this term, but when working in a more general-purpose codebase you can't just assume everyone to know the conventions of the GPU community, and you can't give general-purpose terms a GPU-specific meaning without risking confusion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other APIs

CUDA and HIP call this shared memory, shared between threads in a block.
OpenCL and SYCL call this local memory, shared between threads in a work-group.
GLSL calls this shared memory, shared between invocations in a work group.
DirectX calls this groupshared memory, shared between threads in a thread-group.

This sort of "translation guide" is not actually useful if you are not familiar with any of these things, so I would just leave it out as it is a distraction from the actual description. Especially since it's very easy to go look up a definition of, say, OpenCL's local memory, see it referred to as "this is GLSL's shared memory", look up a definition of that and see it referred to as basically the same idea as groupshared memory in DirectX, then look up a definition of that and... you get the idea. Our definition should stand on its own.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The translation guide might be useful to people that are familiar with these things and wondering why we are making up our own terms.

Copy link
Member

@workingjubilee workingjubilee Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I will merely continue to insist that the description should make sense without reference to prevent infinite regress.

Exceedingly fine details, of course, can be handled elsewhere by other sources, but the concepts should be usefully clear here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure, we need something reasonably concise such that a rust compiler dev with zero GPU knowledge has a rough idea of what this does after reading the description. I don't think that's that hard, GPUs aren't that special, but they use a lot of "weird" (read: grown-over-time) terminology that presents a big barrier to entry.

/// The returned pointer is aligned by at least the alignment of `T`.
Copy link
Member

@RalfJung RalfJung Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of safety requirements... how does one use this pointer? I get that it is aligned, but does it point to enough memory to store a T? If it's always the same address, doesn't everyone overwrite each other's data all the time? This API looks very odd for a non-GPU person, and it's not clear to me whether that is resolved by having more magic behavior (which should be documented or at least referenced here), or whether there's higher-level APIs built on top that deal with this (but this intrinsic provides so few guarantees, I can't see how that should be possible).

Typically, intrinsic documentations should be detailed enough that I can read and write code using the intrinsic and know exactly whether the code is correct and what it will do in all circumstances. I don't know if there's any hope of achieving that with GPU intrinsics, but if not then we need to have a bit of a wider discussion -- we have had bad experience with just importing "externally defined" semantics into Rust without considering all the interactions (in general, it is not logically coherent to have semantics externally defined).

The current docs would let me implement this intrinsic by just always returning 1024, and emitting a compile error if T has alignment bigger than 1024. I doubt that's a legal implementation. But that means the docs are not precise enough to describe what the implementation must do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some prior discussion of the design decision to determine the alignment by giving a type parameter? I could also be a const generic parameter, for instance. I don't have an opinion on the matter since I am an outsider to the GPU world, but as a compiler team member it'd be good to know if this is something you thought about for 5 minutes or whether there's some sort of larger design by a team that has a vision of how all these things will fit together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some discussion in #135516. I don’t mind either way, I thought (for 5 minutes ;)) that specifying the type of the returned pointer makes sense.
I’m not much of a GPU programmer, but I think in most cases, you would store an array in dynamic shared memory, or maybe a struct followed by a dynamically sized array (or maybe two/n arrays of different types).

For just a struct, static shared memory would make more sense, though we don’t support that yet (there’s some discussion in the tracking issue, but I think that’s more complicated to design and implement).

///
/// # Other APIs
///
/// CUDA and HIP call this shared memory.
/// OpenCL and SYCL call this local memory.
#[rustc_intrinsic]
#[rustc_nounwind]
#[unstable(feature = "dynamic_shared_memory", issue = "135513")]
#[cfg(any(target_arch = "amdgpu", target_arch = "nvptx64"))]
pub fn dynamic_shared_memory<T: ?Sized>() -> *mut T;
Copy link
Member

@RalfJung RalfJung Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that outside the GPU world, "shared memory" typically refers to memory shared between processes. So I would suggest using a name that's less likely to be confused, like something that explicitly involves "GPU" or so.

This sounds like a form of "global" memory (similar to a static item), but then apparently OpenCL calls it "local" which is very confusing...

Copy link
Contributor Author

@Flakebi Flakebi Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add a mod gpu?
I think there are more intrinsics for gpus that make can be added (although more in the traditional intrinsic sense, relating to an instruction, edit: re-exposing intrinsics from core::arch::nvptx and the amdgpu equivalent).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or should it be in core::arch::gpu?
(From #135516 (comment), cc @workingjubilee)

Copy link
Member

@RalfJung RalfJung Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust intrinsic names are not namespaced. They are exposed in a module, but inside the compiler they are identified entirely by their name. So moving them into a different module doesn't alleviate the need for a clear name that will be understandable to non-GPU people working in the compiler (which is the vast majority of compiler devs).

If there's more GPU intrinsics to come, moving them into a gpu.rs file here still might make sense.

I don't have a strong opinion on how the eventually stable public API is organized, I am commenting entirely as someone who has an interest in keeping the set of intrinsics the Rust compiler offers understandable and well-defined (the ones in this folder, not the ones in core::arch which you call "more traditional" but that's very dependent on your background ;). These intrinsics are just an implementation detail, but every intrinsic we add here is a new language primitive -- it's like adding a new keyword, just without the syntax discussions and perma-unstable. In the past we used to have intrinsics that entirely break the internal consistency of the language, and we used to have intrinsics whose safety requirements were very poorly documented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ?Sized here is almost certainly wrong. The implementation uses align_of which only makes sense for sized types. Unsized types don't have an alignment by themselves, you need to know the wide ptr metadata to determine the alignment.


/// Copies the current location of arglist `src` to the arglist `dst`.
///
/// FIXME: document safety requirements
Expand Down
8 changes: 8 additions & 0 deletions src/bootstrap/src/core/build_steps/compile.rs
Original file line number Diff line number Diff line change
Expand Up @@ -668,6 +668,14 @@ pub fn std_cargo(builder: &Builder<'_>, target: TargetSelection, cargo: &mut Car
cargo.rustflag("-Cforce-unwind-tables=yes");
}

// amdgcn must have a cpu specified, otherwise it refuses to compile.
// We want to be able to run tests for amdgcn that depend on core, therefore
// we need to be able to compiler core.
// The cpu used here must match in tests that use the standard library.
if target.contains("amdgcn") && target.file.is_none() {
cargo.rustflag("-Ctarget-cpu=gfx900");
}

// Enable frame pointers by default for the library. Note that they are still controlled by a
// separate setting for the compiler.
cargo.rustflag("-Zunstable-options");
Expand Down
1 change: 1 addition & 0 deletions src/build_helper/src/targets.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
// `compiletest`.
pub fn target_supports_std(target_tuple: &str) -> bool {
!(target_tuple.contains("-none")
|| target_tuple.contains("amdgcn")
|| target_tuple.contains("nvptx")
|| target_tuple.contains("switch"))
}
1 change: 1 addition & 0 deletions src/tools/compiletest/src/directives/directive_names.rs
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,7 @@ pub(crate) const KNOWN_DIRECTIVE_NAMES: &[&str] = &[
"only-aarch64",
"only-aarch64-apple-darwin",
"only-aarch64-unknown-linux-gnu",
"only-amdgpu",
"only-apple",
"only-arm",
"only-avr",
Expand Down
27 changes: 27 additions & 0 deletions tests/codegen-llvm/dynamic_shared_memory.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
// Checks that dynamic_shared_memory works.

//@ revisions: amdgpu nvptx x86
//@ compile-flags: --crate-type=rlib
//
//@ [amdgpu] compile-flags: --target amdgcn-amd-amdhsa -Ctarget-cpu=gfx900
//@ [amdgpu] only-amdgpu
//@ [amdgpu] needs-llvm-components: amdgpu
//@ [nvptx] compile-flags: --target nvptx64-nvidia-cuda
//@ [nvptx] only-nvptx64
//@ [nvptx] needs-llvm-components: nvptx
//@ [x86] compile-flags: --target x86_64-unknown-linux-gnu
//@ [x86] only-x86_64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not enough when running on x86_64 Windows. You did need only-x86_64-unknown-linux-gnu I think.

//@ [x86] needs-llvm-components: x86
//@ [x86] should-fail
#![feature(core_intrinsics, dynamic_shared_memory)]
#![no_std]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would using mini-core instead of the precompiled core in the sysroot work instead of hard-coding -Ctarget-cpu=gfx900 in the build system? That way you can also omit the only-amdgpu and only-nvptx64 directives and instead make it run whenever LLVM supports it.

Copy link
Contributor

@Teapot4195 Teapot4195 Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think hardcoding gfx900 as the target will work at all, different uarches can have wildly different codegen'd binaries and are usually not backwards compatible. I think this is a case where we want the user to specify the target architecture that they are compiling for.

This would probably work for smoke testing but in the long term it's probably going to cause more pains than it's worth.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is just about local testing that the compiler works at all, none of the binary compatibility concerns really matter to us. This code is not executed.

But I agree that we should probably just use the minicore for this instead of adding code to the build system that we might have to rip out later. It's easy to update tests... unless they are built on a fragile assumption in the build system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could probably duplicate the intrinsic in minicore, but wouldn’t that defeat the purpose of this test?
The goal here is to test the intrinsic that is added to core.

I think this is a case where we want the user to specify the target architecture that they are compiling for.

Sure, that’s what the target enforces :) Unfortunately it’s the exact thing that throws a wrench into writing tests for the standard library.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicating the intrinsic declaration is fine; what you are testing is the implementation and that is not being duplicated.


use core::intrinsics::dynamic_shared_memory;

// CHECK: @dynamic_shared_memory = external addrspace(3) global [0 x i8], align 8
// CHECK: ret ptr addrspacecast (ptr addrspace(3) @dynamic_shared_memory to ptr)
pub fn fun() -> *mut i32 {
let res = dynamic_shared_memory::<i32>();
dynamic_shared_memory::<f64>(); // Increase alignment to 8
res
}
Loading