Skip to content

Conversation

@Flakebi
Copy link
Contributor

@Flakebi Flakebi commented Dec 28, 2025

Summary

This is my proposed implementation for exposing address spaces from rustc to core and nightly Rust.

It adds a struct AddrspacePtr<T: 'static, const ADDRSPACE: u32> to represent pointers into different address spaces.
This struct translates directly into ptr addrspace(ADDRSPACE) in llvm, so can be used with llvm intrinsics and more.

Details

Discussion on Zulip: https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/Adding.20support.20for.20non-0.20address.20spaces.20in.20Rust/with/565558602
Related discussion: https://internals.rust-lang.org/t/naming-gpu-things-in-the-rust-compiler-and-standard-library/23833/6

What is an address space?

Some hardware, e.g. GPUs, have different physical memory regions.
To access them, the compiler uses different types of pointers.
On the hardware side, this results in different instructions being used to access each address space.

The different types of pointers have different properties:

  • Different size (e.g. 64-bit, 32-bit or 196-bit)
  • What is null? (e.g. 0 or -1)
  • Can the pointer be converted to an int? (or is it “non-integral”)

Why expose address spaces in Rust?

Access to address spaces is necessary to implement parts of the standard library (for some targets, not for x86 obviously 😉).
Concretely, accessing the launch and workgroup size on amdgpu needs to call the llvm.amdgcn.dispatch.ptr intrinsic, which returns a ptr addrspace(4).
Right now, this return type – and therefore the intrinsic – is not representable in Rust.

Alternatives

Option 1: It is possible to implement most amdgpu intrinsic support without general address space support.
Needed intrinsics could be implemented as #[rustc_intrinsic], where the backend calls the llvm intrinsic and then does an addrspacecast to the generic/default addrspace(0), so a pointer that is representable in Rust is returned.
Needing a rustc_intrinsic for every llvm intrinsic that uses a non-generic address space does not scale that well though.
It would also be hard (or in some cases impossible?) to work with pointers that cannot be converted back and forth to addrspace(0) and are non-integral (cannot be converted to/from integers either), like addrspace(7) in amdgpu.

Option 2: In the other direction, make all pointers have an address space, instead of introducing a separate, new type.
A type layout already supports pointers with address spaces (this is also used for AddrspacePtr), so extending the Rust type would be a natural extension (maybe even doing the same for references as well).
oli-obk mentioned that macros in type position could be used to write down the type without needing new syntax.
It adds more to the implementation, as every occurrence of ty::RawPtr needs to be adjusted for a new parameter.
If we want to expose (raw) address spaces to Rust users in stable, this is the way to go. If we do not want to expose address spaces in their raw form (or we are undecided) and only use them as implementation detail in std, I think we are better of with a more contained implementation like AddrspacePtr that does not leak into every raw pointer type.

Design

My expectation is that apart from enabling core features for GPUs (or potentially other targets), explicit addrspaces will not be a much used feature.
I.e. it will be used by core, std and maybe low-level libraries, but it won’t see much direct use by Rust’s users.
Therefore this PR does not try to introduce new syntax to the language but uses a struct as a pointer and const generics for the address space.
I also did not replicate all of the ptr helpers for that reason. When there is a use-case, more can be added.

In theory, the existing ptr intrinsics (offset/read/write/…) could be re-implemented in core using the addrspace_ptr intrinsics, but that would cause (slight) overhead for the common path (always needs a conversion from ptr to addrspace_ptr), therefore I didn’t try it.

There is just one AddrspacePtr type for const/mut pointers for simplicity. (I assume that raw pointers are opaque in the way that it is valid behavior to e.g. convert a const &reference to a *mut ptr and back, as long as the memory behind the *mut ptr is only read but not written?)
Functionality that exists on raw pointers but is not implemented on AddrspacePtr here is read/write_volatile, copy(_nonoverlapping) and atomic operations for now (I’m mentioning these just in case someone sees potential problems with the current design and adding these in the future).
Just to mention it for completeness, something related to AddrspacePtr that we would like to have in the future is the ability to declare static variables in a certain address space (for GPU workgroup memory).

Implementation

A new lang item addrspace_ptr_type is added. It must be a struct AddrspacePtr<T: 'static, const ADDRSPACE: u32>.
The content of this struct as written in the Rust source is ignored when computing the type layout. It is replaced by a pointer into address space ADDRSPACE.

As the content of the struct is replaced inside the compiler, the content cannot be “looked at” in the Rust source.
To do something with the type (more than just holding it), rustc_intrinsics are used.

Intrinsics are added to to

  • cast between AddrspacePtr<T, addrspace::GENERIC> and *mut T, this is just syntactical type conversion
  • cast to a different type T (a no-op, just type conversion) or cast to a different address space (addrspacecast in llvm)
  • convert to an integer (ptrtoint in llvm)
  • add to the offset (ptr::[wrapping_]offset, getelementptr in llvm)
  • and read/write from the pointer (load/store in llvm)

This should satisfy all basic needs for operations on pointers.

It is disallowed to deref an AddrspacePtr directly in Rust, read/write should be used instead.
Rustc still uses deref internally to implement read/write, this mirrors raw pointers.

Somewhat Open Points

Somewhat new introductions where I am unsure if there’s a better way:

  • There are now target-specific address spaces in core::ptr::addrspace, besides addrspace::GENERIC; so far core::ptr was target-agnostic
  • AddrspacePtr is (to my knowledge) the first type that has its content/layout defined inside the compiler
  • The current restriction is T: 'static (and an implicit Sized), should it be something different?

r? @CAD97 (assigning you because we already discussed in the Discourse post, feel free to re-assign)

A new lang item `addrspace_ptr_type` is added. It must be a `struct AddrspacePtr<T: 'static, const ADDRSPACE: u32>`.
The content of this struct as written in the Rust source is ignored when computing the type layout. It is replaced by a pointer into address space `ADDRSPACE`.

As the content of the struct is replaced inside the compiler, it cannot be “looked at” in the Rust source.
To do something with the type (more than just holding it), rustc_intrinsics are used.

Intrinsics are added to to
- cast between `AddrspacePtr<T, addrspace::GENERIC>` and `*mut T`, this is just syntactical type conversion
- cast to a different type `T` (a no-op, just type conversion) or cast to a different address space (`addrspacecast` in llvm)
- convert to an integer (`ptrtoint` in llvm)
- add to the offset (`ptr::[wrapping_]offset`, `getelementptr` in llvm)
- and read/write from the pointer (`load/store` in llvm)

This should satisfy all basic needs for operations on pointers.

It is disallowed to deref an `AddrspacePtr` directly in Rust, read/write should be used instead.
Rustc still uses deref internally to implement read/write.
@rustbot
Copy link
Collaborator

rustbot commented Dec 28, 2025

⚠️ #[rustc_allow_const_fn_unstable] needs careful audit to avoid accidentally exposing unstable
implementation details on stable.

cc @rust-lang/wg-const-eval

This PR modifies tests/auxiliary/minicore.rs.

cc @jieyouxu

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

Some changes occurred to the intrinsics. Make sure the CTFE / Miri interpreter
gets adapted for the changes, if necessary.

cc @rust-lang/miri, @RalfJung, @oli-obk, @lcnr

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-test-infra-minicore Area: `minicore` test auxiliary and `//@ add-core-stubs` S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Dec 28, 2025
@rust-log-analyzer
Copy link
Collaborator

The job aarch64-gnu-llvm-20-1 failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)

---- [codegen] tests/codegen-llvm/addrspace-ptr-basic.rs stdout ----
------FileCheck stdout------------------------------

------FileCheck stderr------------------------------
/checkout/tests/codegen-llvm/addrspace-ptr-basic.rs:216:12: error: CHECK: expected string not found in input
 // CHECK: %[[val:[^ ]+]] = tail call noundef ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
           ^
/checkout/obj/build/aarch64-unknown-linux-gnu/test/codegen-llvm/addrspace-ptr-basic/addrspace-ptr-basic.ll:64:62: note: scanning from here
define noundef nonnull ptr addrspace(4) @get_raw_dispatch_ptr() unnamed_addr #0 {
                                                             ^
/checkout/obj/build/aarch64-unknown-linux-gnu/test/codegen-llvm/addrspace-ptr-basic/addrspace-ptr-basic.ll:66:2: note: possible intended match here
 %_0 = tail call ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
 ^
/checkout/tests/codegen-llvm/addrspace-ptr-basic.rs:224:12: error: CHECK: expected string not found in input
 // CHECK: %[[val:[^ ]+]] = tail call noundef ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
           ^
/checkout/obj/build/aarch64-unknown-linux-gnu/test/codegen-llvm/addrspace-ptr-basic/addrspace-ptr-basic.ll:71:53: note: scanning from here
define noundef nonnull align 2 ptr @get_dispatch_ptr() unnamed_addr #0 {
                                                    ^
/checkout/obj/build/aarch64-unknown-linux-gnu/test/codegen-llvm/addrspace-ptr-basic/addrspace-ptr-basic.ll:73:2: note: possible intended match here
 %_3 = tail call ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
 ^

Input file: /checkout/obj/build/aarch64-unknown-linux-gnu/test/codegen-llvm/addrspace-ptr-basic/addrspace-ptr-basic.ll
Check file: /checkout/tests/codegen-llvm/addrspace-ptr-basic.rs

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             1: ; ModuleID = 'addrspace_ptr_basic.3673a756f765de4c-cgu.0' 
             2: source_filename = "addrspace_ptr_basic.3673a756f765de4c-cgu.0" 
             3: target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9" 
             4: target triple = "amdgcn-amd-amdhsa" 
             5:  
             6: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable 
             7: define noundef i32 @addr(ptr addrspace(3) noundef %ptr) unnamed_addr #0 { 
             8: start: 
             9:  %0 = ptrtoint ptr addrspace(3) %ptr to i32 
            10:  ret i32 %0 
            11: } 
            12:  
            13: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable 
            14: define noundef ptr addrspace(3) @cast(ptr addrspace(3) noundef readnone returned %ptr) unnamed_addr #0 { 
            15: start: 
            16:  ret ptr addrspace(3) %ptr 
            17: } 
            18:  
            19: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable 
            20: define noundef ptr @cast_addrspace(ptr addrspace(3) noundef readnone %ptr) unnamed_addr #0 { 
            21: start: 
            22:  %0 = addrspacecast ptr addrspace(3) %ptr to ptr 
            23:  ret ptr %0 
            24: } 
            25:  
            26: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable 
            27: define noundef ptr addrspace(3) @offset(ptr addrspace(3) noundef readnone %ptr) unnamed_addr #0 { 
            28: start: 
            29:  %_0 = getelementptr inbounds i8, ptr addrspace(3) %ptr, i32 -20 
            30:  ret ptr addrspace(3) %_0 
            31: } 
            32:  
            33: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable 
            34: define noundef ptr addrspace(3) @wrapping_offset(ptr addrspace(3) noundef readnone %ptr) unnamed_addr #0 { 
            35: start: 
            36:  %0 = getelementptr i8, ptr addrspace(3) %ptr, i32 -15 
            37:  ret ptr addrspace(3) %0 
            38: } 
            39:  
            40: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: read) uwtable 
            41: define noundef i32 @read(ptr addrspace(3) nocapture noundef readonly %ptr) unnamed_addr #1 { 
            42: start: 
            43:  %_0 = load i32, ptr addrspace(3) %ptr, align 4, !noundef !2 
            44:  ret i32 %_0 
            45: } 
            46:  
            47: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write) uwtable 
            48: define void @write(ptr addrspace(3) nocapture noundef writeonly initializes((0, 4)) %ptr, i32 noundef %val) unnamed_addr #2 { 
            49: start: 
            50:  store i32 %val, ptr addrspace(3) %ptr, align 4 
            51:  ret void 
            52: } 
            53:  
            54: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable 
            55: define { ptr addrspace(3), i64 } @fun(ptr noundef %ptr) unnamed_addr #0 { 
            56: start: 
            57:  %0 = addrspacecast ptr %ptr to ptr addrspace(3) 
            58:  %1 = insertvalue { ptr addrspace(3), i64 } poison, ptr addrspace(3) %0, 0 
            59:  %2 = insertvalue { ptr addrspace(3), i64 } %1, i64 4, 1 
            60:  ret { ptr addrspace(3), i64 } %2 
            61: } 
            62:  
            63: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable 
            64: define noundef nonnull ptr addrspace(4) @get_raw_dispatch_ptr() unnamed_addr #0 { 
check:216'0                                                                  X~~~~~~~~~~~~~~~~~~~~ error: no match found
            65: start: 
check:216'0     ~~~~~~~
            66:  %_0 = tail call ptr addrspace(4) @llvm.amdgcn.dispatch.ptr() 
check:216'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:216'1      ?                                                             possible intended match
            67:  ret ptr addrspace(4) %_0 
check:216'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~
            68: } 
check:216'0     ~~
            69:  
check:216'0     ~
            70: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable 
check:216'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            71: define noundef nonnull align 2 ptr @get_dispatch_ptr() unnamed_addr #0 { 
check:216'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:224'0                                                         X~~~~~~~~~~~~~~~~~~~~ error: no match found
            72: start: 
check:224'0     ~~~~~~~
            73:  %_3 = tail call ptr addrspace(4) @llvm.amdgcn.dispatch.ptr() 
check:224'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:224'1      ?                                                             possible intended match
            74:  %0 = addrspacecast ptr addrspace(4) %_3 to ptr 
check:224'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            75:  ret ptr %0 
check:224'0     ~~~~~~~~~~~~
            76: } 
check:224'0     ~~
            77:  
check:224'0     ~
            78: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable 
check:224'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            79: define noundef ptr addrspace(3) @read_ptr(ptr addrspace(4) nocapture noundef readonly %ptr) unnamed_addr #0 { 
check:224'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            80: start: 
            81:  %_0 = load ptr addrspace(3), ptr addrspace(4) %ptr, align 4, !noundef !2 
            82:  ret ptr addrspace(3) %_0 
            83: } 
            84:  
            85: ; Function Attrs: mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none) 
            86: declare noundef nonnull align 4 ptr addrspace(4) @llvm.amdgcn.dispatch.ptr() unnamed_addr #3 
            87:  
            88: attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable "target-cpu"="gfx900" } 
            89: attributes #1 = { mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: read) uwtable "target-cpu"="gfx900" } 
            90: attributes #2 = { mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write) uwtable "target-cpu"="gfx900" } 
            91: attributes #3 = { mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none) } 
            92:  
            93: !llvm.module.flags = !{!0} 
            94: !llvm.ident = !{!1} 
            95:  
            96: !0 = !{i32 8, !"PIC Level", i32 2} 
            97: !1 = !{!"rustc version 1.94.0-nightly (51e530a61 2025-12-28)"} 
            98: !2 = !{} 
>>>>>>

------------------------------------------

error: verification with 'FileCheck' failed
status: exit status: 1
command: "/usr/lib/llvm-20/bin/FileCheck" "--input-file" "/checkout/obj/build/aarch64-unknown-linux-gnu/test/codegen-llvm/addrspace-ptr-basic/addrspace-ptr-basic.ll" "/checkout/tests/codegen-llvm/addrspace-ptr-basic.rs" "--check-prefix=CHECK" "--allow-unused-prefixes" "--dump-input-context" "100"
stdout: none
--- stderr -------------------------------
/checkout/tests/codegen-llvm/addrspace-ptr-basic.rs:216:12: error: CHECK: expected string not found in input
 // CHECK: %[[val:[^ ]+]] = tail call noundef ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
           ^
/checkout/obj/build/aarch64-unknown-linux-gnu/test/codegen-llvm/addrspace-ptr-basic/addrspace-ptr-basic.ll:64:62: note: scanning from here
define noundef nonnull ptr addrspace(4) @get_raw_dispatch_ptr() unnamed_addr #0 {
                                                             ^
/checkout/obj/build/aarch64-unknown-linux-gnu/test/codegen-llvm/addrspace-ptr-basic/addrspace-ptr-basic.ll:66:2: note: possible intended match here
 %_0 = tail call ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
 ^
/checkout/tests/codegen-llvm/addrspace-ptr-basic.rs:224:12: error: CHECK: expected string not found in input
 // CHECK: %[[val:[^ ]+]] = tail call noundef ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
           ^
/checkout/obj/build/aarch64-unknown-linux-gnu/test/codegen-llvm/addrspace-ptr-basic/addrspace-ptr-basic.ll:71:53: note: scanning from here
define noundef nonnull align 2 ptr @get_dispatch_ptr() unnamed_addr #0 {
                                                    ^
/checkout/obj/build/aarch64-unknown-linux-gnu/test/codegen-llvm/addrspace-ptr-basic/addrspace-ptr-basic.ll:73:2: note: possible intended match here
 %_3 = tail call ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
 ^

Input file: /checkout/obj/build/aarch64-unknown-linux-gnu/test/codegen-llvm/addrspace-ptr-basic/addrspace-ptr-basic.ll
Check file: /checkout/tests/codegen-llvm/addrspace-ptr-basic.rs

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             1: ; ModuleID = 'addrspace_ptr_basic.3673a756f765de4c-cgu.0' 
             2: source_filename = "addrspace_ptr_basic.3673a756f765de4c-cgu.0" 
             3: target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9" 
             4: target triple = "amdgcn-amd-amdhsa" 
             5:  
             6: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable 
             7: define noundef i32 @addr(ptr addrspace(3) noundef %ptr) unnamed_addr #0 { 
             8: start: 
             9:  %0 = ptrtoint ptr addrspace(3) %ptr to i32 
            10:  ret i32 %0 
            11: } 
            12:  
            13: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable 
            14: define noundef ptr addrspace(3) @cast(ptr addrspace(3) noundef readnone returned %ptr) unnamed_addr #0 { 
            15: start: 
            16:  ret ptr addrspace(3) %ptr 
            17: } 
            18:  
            19: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable 
            20: define noundef ptr @cast_addrspace(ptr addrspace(3) noundef readnone %ptr) unnamed_addr #0 { 
            21: start: 
            22:  %0 = addrspacecast ptr addrspace(3) %ptr to ptr 
            23:  ret ptr %0 
            24: } 
            25:  
            26: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable 
            27: define noundef ptr addrspace(3) @offset(ptr addrspace(3) noundef readnone %ptr) unnamed_addr #0 { 
            28: start: 
            29:  %_0 = getelementptr inbounds i8, ptr addrspace(3) %ptr, i32 -20 
            30:  ret ptr addrspace(3) %_0 
            31: } 
            32:  
            33: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable 
            34: define noundef ptr addrspace(3) @wrapping_offset(ptr addrspace(3) noundef readnone %ptr) unnamed_addr #0 { 
            35: start: 
            36:  %0 = getelementptr i8, ptr addrspace(3) %ptr, i32 -15 
            37:  ret ptr addrspace(3) %0 
            38: } 
            39:  
            40: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: read) uwtable 
            41: define noundef i32 @read(ptr addrspace(3) nocapture noundef readonly %ptr) unnamed_addr #1 { 
            42: start: 
            43:  %_0 = load i32, ptr addrspace(3) %ptr, align 4, !noundef !2 
            44:  ret i32 %_0 
            45: } 
            46:  
            47: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write) uwtable 
            48: define void @write(ptr addrspace(3) nocapture noundef writeonly initializes((0, 4)) %ptr, i32 noundef %val) unnamed_addr #2 { 
            49: start: 
            50:  store i32 %val, ptr addrspace(3) %ptr, align 4 
            51:  ret void 
            52: } 
            53:  
            54: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable 
            55: define { ptr addrspace(3), i64 } @fun(ptr noundef %ptr) unnamed_addr #0 { 
            56: start: 
            57:  %0 = addrspacecast ptr %ptr to ptr addrspace(3) 
            58:  %1 = insertvalue { ptr addrspace(3), i64 } poison, ptr addrspace(3) %0, 0 
            59:  %2 = insertvalue { ptr addrspace(3), i64 } %1, i64 4, 1 
            60:  ret { ptr addrspace(3), i64 } %2 
            61: } 
            62:  
            63: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable 
            64: define noundef nonnull ptr addrspace(4) @get_raw_dispatch_ptr() unnamed_addr #0 { 
check:216'0                                                                  X~~~~~~~~~~~~~~~~~~~~ error: no match found
            65: start: 
check:216'0     ~~~~~~~
            66:  %_0 = tail call ptr addrspace(4) @llvm.amdgcn.dispatch.ptr() 
check:216'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:216'1      ?                                                             possible intended match
            67:  ret ptr addrspace(4) %_0 
check:216'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~
            68: } 
check:216'0     ~~
            69:  
check:216'0     ~
            70: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable 
check:216'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            71: define noundef nonnull align 2 ptr @get_dispatch_ptr() unnamed_addr #0 { 
check:216'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:224'0                                                         X~~~~~~~~~~~~~~~~~~~~ error: no match found
            72: start: 
check:224'0     ~~~~~~~
            73:  %_3 = tail call ptr addrspace(4) @llvm.amdgcn.dispatch.ptr() 
check:224'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:224'1      ?                                                             possible intended match
            74:  %0 = addrspacecast ptr addrspace(4) %_3 to ptr 
check:224'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            75:  ret ptr %0 
check:224'0     ~~~~~~~~~~~~
            76: } 
check:224'0     ~~
            77:  
check:224'0     ~
            78: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable 
check:224'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            79: define noundef ptr addrspace(3) @read_ptr(ptr addrspace(4) nocapture noundef readonly %ptr) unnamed_addr #0 { 
check:224'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            80: start: 
            81:  %_0 = load ptr addrspace(3), ptr addrspace(4) %ptr, align 4, !noundef !2 
            82:  ret ptr addrspace(3) %_0 
            83: } 
            84:  
            85: ; Function Attrs: mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none) 
            86: declare noundef nonnull align 4 ptr addrspace(4) @llvm.amdgcn.dispatch.ptr() unnamed_addr #3 
            87:  
            88: attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable "target-cpu"="gfx900" } 
            89: attributes #1 = { mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: read) uwtable "target-cpu"="gfx900" } 
            90: attributes #2 = { mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write) uwtable "target-cpu"="gfx900" } 
            91: attributes #3 = { mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none) } 
            92:  
            93: !llvm.module.flags = !{!0} 
            94: !llvm.ident = !{!1} 
            95:  
            96: !0 = !{i32 8, !"PIC Level", i32 2} 
            97: !1 = !{!"rustc version 1.94.0-nightly (51e530a61 2025-12-28)"} 
            98: !2 = !{} 
>>>>>>
------------------------------------------

---- [codegen] tests/codegen-llvm/addrspace-ptr-basic.rs stdout end ----

@CAD97
Copy link
Contributor

CAD97 commented Dec 28, 2025

The approach looks reasonable to me, but I'm not that familiar with this part of the compiler and I'm not on the reviewers list, so

r? compiler

@rustbot rustbot assigned fee1-dead and unassigned CAD97 Dec 28, 2025
@tgross35 tgross35 added the T-lang Relevant to the language team label Dec 28, 2025
@RalfJung
Copy link
Member

This seems to be a new language feature, so I would expect a bit more time spent on figuring out a proper design before we add this to the compiler. It may need an RFC. At the very least, we should have a description of what this type does that does not mention LLVM at all -- LLVM is an implementation detail of Rust, not something users should be exposed to, and it is not the only codegen backend.

#[unstable(feature = "ptr_addrspace", issue = "none")]
#[lang = "addrspace_ptr_type"]
#[allow(missing_debug_implementations)]
pub struct AddrspacePtr<T: 'static, const ADDRSPACE: u32> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is big enough. Please move the new things to a new file.

#[unstable(feature = "ptr_addrspace", issue = "none")]
#[cfg(any(doc, target_arch = "amdgpu", target_arch = "nvptx64"))]
#[doc(cfg(any(target_arch = "amdgpu", target_arch = "nvptx64")))]
pub const CONST: u32 = 4;
Copy link
Member

@RalfJung RalfJung Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these random constants, where do they come from? If they are LLVM-specific, then they should not show up here -- after all, what should the cranelift/GCC backend or Miri do with them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're essentially arbitrary identifiers for identifying what address space a nonlocal pointer points into.

The most correct representation would probably be a #[non_exhaustive] enum, but I don't know how difficult that would be to introspect in the compiler compared to a simple const generic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arbitrary identifiers defined by who? Who is the receiver that interprets these numbers? Are they fixed by the target ABI or so, or is it just something internal to the LLVM codegen backends for these targets?

Copy link
Contributor

@CAD97 CAD97 Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names are defined by the OpenCL standard, "with some additions." The numbers "correspond to target architecture specific LLVM address space numbers used in LLVM IR" and are defined by the AMDGPU target.

https://llvm.org/docs/AMDGPUUsage.html#address-spaces

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OpenCL standard specifically talks about LLVM...?!? Aren't there non-LLVM implementations?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you said names, not numbers. The numbers are not in the standard, just in LLVM.

That makes it likely a bad idea to have them in library...

@rust-lang rust-lang deleted a comment from diffray-bot Dec 28, 2025
@rust-lang rust-lang deleted a comment from diffray-bot Dec 28, 2025
@rust-lang rust-lang deleted a comment from diffray-bot Dec 28, 2025
@rust-lang rust-lang deleted a comment from diffray-bot Dec 28, 2025
@CAD97
Copy link
Contributor

CAD97 commented Dec 28, 2025

FWIW, I believe this is mostly intended to be used internally for the Rust-GPU work, rather than stabilization track. This feature reduced to absurdity is "what do we need to link to LLVM GPU intrinsics" and not intended for use beyond that.

At least currently, I don't think any target other than GPU targets and maybe WASM have any use for nonzero addrspaceLLVM pointers.

Is there an @ group to ping for the GPU target?

///
/// # Safety
///
/// If the targets raw pointer address space matches the generic address space, this function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the function returns a pointer in the generic address space, so why this condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointers in Rust are mostly in the generic address space, but there are exceptions.
For example with the avr-none target, *const u32 would be in the generic address space (so, no problem to convert),
but *const fn() is in address space 1 (due to the P1 in the data layout), therefore converting it to the generic address space involves an address space cast.

If we say this must always be possible (well, one can write *const fn() as *const u32 in safe Rust and it results in an addrspacecast), I can remove this safety comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! Maybe the comment could be clarified to "If the target's address space for this pointer type (e.g. data pointer vs function pointer) matches the generic address space ..."

@Flakebi
Copy link
Contributor Author

Flakebi commented Dec 29, 2025

Thanks for taking a look!

FWIW, I believe this is mostly intended to be used internally for the Rust-GPU work, rather than stabilization track. This feature reduced to absurdity is "what do we need to link to LLVM GPU intrinsics" and not intended for use beyond that.

Exactly that 😊

I want to access declare ptr addrspace(4) @llvm.amdgcn.dispatch.ptr() to implement core for amdgpu.
When I first tried that a year ago when starting the amdgpu target, I hacked rustc to change the return type to ptr addrspace(4) when it encounters an llvm.amdgcn.dispatch.ptr function name.
That is quite hacky, so I never opened a PR for that (I also didn’t have that much time outside christmas holidays ;).

As a stop-gap solution in the amdgpu-device-libs crate, I use linker-plugin-lto and link in an LLVM bitcode file that calls llvm.amdgcn.dispatch.ptr + an addrspacecast to addrspace(0) and exposes that as a function that can be called from Rust.
I use the bitcode linking anyway for println and allocator support (which is implemented in clang, not in libllvm) but this is not something I can use in Rust core.

When looking at it again this year – also keeping in mind that there are more intrinsics than dispatch.ptr that use addrspaces – I tried to find a generic way that doesn’t need changes in rustc for every affected intrinsic.
After getting the basics working (the change in layout_of_uncached), one thing lead to another and I ended up with (almost) complete address space support…

So, I’m happy with any solution that lets me call llvm.amdgcn.dispatch.ptr.

I think a generic AddrspacePtr is quite powerful and useful for targets that use the LLVM backend (maybe something like core::arch::llvm?).
cranelift doesn’t support address spaces and will probably never support amdgpu.
gcc has some basic amdgpu support, though realistically I think they are far behind LLVM support (which is developed by AMD themselves and receives a lot support), so for the amdgpu Rust target, it makes more sense to focus on the LLVM backend.
I didn’t find much documentation about gcc (amdgpu is not even mentioned in some places that describe gcc’s targets), from what I found it seems gcc does not have address spaces, instead they have a "shared" attribute attached somewhere for workgroup memory (and that’s it feature-wise).

How do you prefer Rust to call llvm.amdgcn.dispatch.ptr and others?

@oli-obk
Copy link
Contributor

oli-obk commented Dec 29, 2025

cc @ZuseZ4 @eddyb @Firestar99

@RalfJung
Copy link
Member

RalfJung commented Dec 29, 2025

I think the main point is that whatever you add to library, you should be able to describe and explain it without ever mentioning anything LLVM-specific. Rust is more than a thin layer over LLVM, and I am not sure it's a good idea to have a bunch of code in rustc that has no chance of ever being stabilized because it is too tied to LLVM.

From the OpenCL docs, it seems like the concept of having multiple address spaces is not LLVM-specific. But these specific numbers you used are. So what about having an enum AddressSpace with variants for all existing address spaces, and then each codegen backend can translate this to however those address spaces are represented in its backend (e.g., to the appropriate address space numbers in LLVM)? You can see https://doc.rust-lang.org/nightly/std/intrinsics/enum.AtomicOrdering.html for an existing example of an enum used as const generic argument in intrinsics, and how codegen handles that.

We should also somewhere have documentation of what these address spaces even are. From the little I remember about LLVM address spaces, they have a peculiar way of using the term that's somewhat distinct from what one might expect. (For instance, I would by default expect every address space to be entirely independent of all the others, referring to its own disjoint domain of things. That makes the operation of casting between address spaces entirely meaningless. But apparently LLVM address spaces are something different, not about the meaning of the address but just about the exact instruction used to compile loads and stores? No idea why that's a part of the type rather than a flag at the load/store... I am probably misunderstanding something. But you get the idea. If you add new fundamental language concepts to Rust, we have to make sure they can be understood by all core stakeholders that relate to Rust semantics, including those with no background in GPU programming. It doesn't matter that libcore and libstd are the only intended direct users of that feature -- that means we can skip the syntax discussion, but there's no shortcut to properly defining the semantics of everything libcore uses.)

Note in particular that rustc needs to know the size and alignment of all its types without going through LLVM. For all these address spaces, everything relevant about them should be fully defined and documented inside Rust. There should be enough information such that supporting those address spaces in Miri is "just" a matter of implementation, not a matter of figuring out what exactly each operation even does.

@RalfJung
Copy link
Member

RalfJung commented Dec 29, 2025

So, I’m happy with any solution that lets me call llvm.amdgcn.dispatch.ptr.

Just call it and then cast the result to address space 0? A Rust intrinsic can expand to multiple LLVM operations.

So far I don't understand why non-default address spaces ever need to be visible in the Rust code. But I don't really understand address spaces so that is not surprising. ;) The PR seems to assume the reader already knows LLVM's notion of address spaces and has accepted it as necessity.

EDIT: Ah you already talked about that...

Option 1: It is possible to implement most amdgpu intrinsic support without general address space support.
Needed intrinsics could be implemented as #[rustc_intrinsic], where the backend calls the llvm intrinsic and then does an addrspacecast to the generic/default addrspace(0), so a pointer that is representable in Rust is returned.
Needing a rustc_intrinsic for every llvm intrinsic that uses a non-generic address space does not scale that well though.
It would also be hard (or in some cases impossible?) to work with pointers that cannot be converted back and forth to addrspace(0) and are non-integral (cannot be converted to/from integers either), like addrspace(7) in amdgpu.

Directly calling LLVM intrinsics is a last resort and a big headache for everyone else.

pub struct AddrspacePtr<T: 'static, const ADDRSPACE: u32> {
// Struct implementation is replaced by the compiler.
// This field is here for using the generic argument but cannot be set or accessed in any way.
do_not_use: PhantomData<*const T>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a very fragile hack -- having fields that then get removed half-way through compilation. There's a huge risk of confusing some part of the compiler that still looks at those fields. Not even Box does something this cursed, and that type is already quite cursed...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your sentiment, I thought the same when I initially tried to get it working.
It is similar to #[rustc_intrinsic]s with a dummy/fallback body that gets replaced during compilation.
I would argue that while fallback intrinsic bodies have the risk of getting inlined and therefore silently mis-compile a program, the dummy struct body here will always fail compilation if used so it’s actually safer than the intrinsics that rustc already uses.

Rustc so far only replaced function bodies, replacing a type body is something new, therefore I explicitly called this out in the PR description (in Somewhat Open Points).

Note in particular that rustc needs to know the size and alignment of all its types without going through LLVM.

This works perfectly fine (see also the test I wrote in the PR), Rust knows the correct size and alignment because we replace the type layout of AddrspacePtr with a valid layout of a pointer to the specified address space.
Every access to the struct goes through the type layout, that’s where it fails compilation if some code tries to access the body.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reading my comment, I want to clarify it, so it doesn’t come across wrongly: I wanted to express that I think replacing the struct body as implemented here is not as bad as it looks at a first glance.
There are probably many ways to improve it, make it look better and more robust. But I didn’t find the better places to modify so far. So for everyone who reads this comment, if you know more about the Rust compiler than I do (well, that’s a given ;)) and you have concrete or not-so-concrete ideas for improvement, let me hear them!

@Flakebi
Copy link
Contributor Author

Flakebi commented Dec 29, 2025

Directly calling LLVM intrinsics is a last resort and a big headache for everyone else.

So, you prefer having a #[rustc_intrinsic] and implement that by the llvm intrinsic + an addrspacecast (alternative option 1)?
If we go down that route, we can skip all discussions about address spaces in the Rust language :)

I’ll try to answer this question anyway:

I would by default expect every address space to be entirely independent of all the others, referring to its own disjoint domain of things.

As far as I know, LLVM doesn’t really define what an address space is or how one is supposed to use it. It makes different pointer types with e.g. different sizes, but that’s about it.
What the different pointer types mean and do is entirely up to the target. It can be disjoint domains that cannot be converted between, but it doesn’t need to be. The target could also define address space 0–10 being the same (i.e. completely aliasing and an addrspacecast is a no-op). amdgpu has a mixture of these, some address spaces partially alias and can be converted between, some don’t.

@RalfJung
Copy link
Member

So, you prefer having a #[rustc_intrinsic] and implement that by the llvm intrinsic + an addrspacecast (alternative option 1)?
If we go down that route, we can skip all discussions about address spaces in the Rust language :)

Well, it's the "obvious" alternative. It's hard for me to judge the trade-offs here as I don't have an overview over the design space. :) That's why it would help to have a bit of a design document for what the desired goal here looks like -- are we talking about a single intrinsic, or a dozen, or hundreds, etc?

@Flakebi
Copy link
Contributor Author

Flakebi commented Dec 29, 2025

are we talking about a single intrinsic, or a dozen, or hundreds, etc?

Counting intrinsics (IR/IntrinsicsAMDGPU.td and IR/IntrinsicsNVVM.td in llvm) that use non-0 address spaces, there are about 80 (a few more that I skipped because they’re even more special purpose; plus I didn’t count non-intrinsics/normal operations on these address spaces like load/store/atomics).
I’d say most of those are as common as one would use SSE intrinsics on CPUs. I.e. never used by most people but a minority that’s fine with nightly uses them to get the last few percents of performance.
(There’s mentions of address spaces in SPIR-V and WASM intrinsics but I didn’t find any actual intrinsics using specific address spaces.)

The more important ones, e.g. necessary to implement a useful std/core is about a dozen intrinsics.
(In case it’s not obvious, even for those more useful intrinsics, we don’t want to expose them directly in stable Rust but build an interface around them.)

@CAD97
Copy link
Contributor

CAD97 commented Jan 6, 2026

For my 2¢, I do think it makes sense for the implementation primitive to be generic over the address space. But I also think it potentially makes more sense for the type public outside core/stdarch to be specific to each address space that has intrinsics which manipulate it.

A lang-item struct whose repr gets overridden is just a smaller surface (and can be provided completely independently from usual pointers). Something more like *mut builtin#addrspace "workgroup" T could work, but now you both potentially pollute standard pointer docs with very niche functionality and have to come up with some way to be generic over addrspace instead of just straightforward const generic surface.

That what addrspaces are available is cfg dependent also makes me lean more towards a library generic than a primitive per addrspace.

Changing the pointer repr based on pointee type à la fat pointers is perhaps the most fitting design, but alternative pointer reprs that aren't just *mut () plus metadata feels ill-advised.


Although, all that said, I have a bit of a bias towards presupposing that the compiler "should" support arbitrary addrspaces in a uniform manner instead of each one directly. If that predicate is incorrect, my thoughts here become moot.

@jhpratt
Copy link
Member

jhpratt commented Jan 7, 2026

This appears to be firmly in compiler territory, which I'm not a reviewer for.

r? compiler

@rustbot rustbot assigned jdonszelmann and unassigned jhpratt Jan 7, 2026
@workingjubilee
Copy link
Member

r? @workingjubilee

I don't expect this convo to go anywhere at the moment, I'm just taking this off the hot potato cycle while we figure out what we even want to do in this arena of thought.

@workingjubilee
Copy link
Member

workingjubilee commented Jan 7, 2026

Changing the pointer repr based on pointee type à la fat pointers is perhaps the most fitting design, but alternative pointer reprs that aren't just *mut () plus metadata feels ill-advised.

@CAD97 Mm, I feel that's not very coherent here, if I'm understanding you correctly?

Maybe I'm not? Because it's not clear to me at all that we would determine the overall layout of the pointer based on the addrspace of the pointee.

For instance, a dyn Trait pointee, as I understand it, may have its other pointee... in practice, a vtable... be in the code addrspace, so that all the functions it calls can be a small hop away. So a fat pointer could have two addrspaces for its pointees.

@CAD97
Copy link
Contributor

CAD97 commented Jan 7, 2026

@workingjubilee In a world where addrspace was set by pointee type, the data part of the pointer (i.e. *mut ()) would have a size dependent on addrspace... and I suspect non-default addrspaces would be incompatible with slice fat pointers and/or dyn trait objects. (and *mut dyn Trait is (*mut (), DynMetadata<dyn Trait>))

@workingjubilee
Copy link
Member

Hmm. Well I definitely don't see why we couldn't have *mut [T] in a non-default addrspace.

...dyn Trait may be stressing a lot of things though yeah.

@RalfJung
Copy link
Member

RalfJung commented Jan 7, 2026

A lang-item struct whose repr gets overridden is just a smaller surface (and can be provided completely independently from usual pointers). Something more like *mut builtin#addrspace "workgroup" T could work, but now you both potentially pollute standard pointer docs with very niche functionality and have to come up with some way to be generic over addrspace instead of just straightforward const generic surface.

I deliberately did not propose to extend the existing raw pointers with an addrspace parameter. But the "struct whose fields are replaced" is a weird hack that leads to a compiler with a split mind, where depending on which APIs are used one sees entirely different fields. This is the kind of design that leads to lots of hair-pulling, confusion, and cursing down the line when people who are not in this discussion (or who forgot about it) try to figure out what on Earth is going on.

Also, from a t-types and t-opsem perspective, having the type "only for internal use" simplifies absolutely nothing. We still have to fully specify how it is supposed to work and ensure it behaves correctly.

@workingjubilee
Copy link
Member

Yeees, I think it's better to just embed "the ADT has a special layout" as, er, "the ADT has a special layout", rather than having magical disappearing/reappearing fields.

@RalfJung
Copy link
Member

RalfJung commented Jan 7, 2026

We don't really have "ADTs whose size is determined by the backend" though. I'm not sure what best way is to represent such types. One could imagine something like

#[cfg(amdgpu)]
#[repr(addrspace(...))]
struct LocalPtr(...);

but then the library would still have to know the right size and alignment for the field.

The only types we have where size and alignment are entirely determined by the target are primitive types: usize, pointers.

Kobzol added a commit to Kobzol/rust that referenced this pull request Jan 7, 2026
…ubilee

Add amdgpu_dispatch_ptr intrinsic

There is an ongoing discussion in rust-lang#150452 about using address spaces from the Rust language in some way.
As that discussion will likely not conclude soon, this PR adds one rustc_intrinsic with an addrspacecast to unblock getting basic information like launch and workgroup size and make it possible to implement something like `core::gpu`.

Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the launch size and workgroup size.

The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to `addrspace(0)`, so it can be returned as a Rust reference.
The returned pointer/reference is valid for the whole program lifetime, and is therefore `'static`.
The return type of the intrinsic (`&'static ()`) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function.
Is this ok or is there a better way (also, should it return a pointer instead of a reference)?

Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```

Tracking issue: rust-lang#135024

r? RalfJung as you are already aware of the background (feel free to re-assign)
@workingjubilee
Copy link
Member

Pointers into other address spaces that have different layouts should have their own data layouts given in the data layout string of an LLVM target. I believe we already have some augmentation for the sake of this, it would be a matter of bringing that home.

@RalfJung
Copy link
Member

RalfJung commented Jan 7, 2026

Yes, that's the "easy" part. The hard part is, how are those pointer types exposed in the language?

@Flakebi
Copy link
Contributor Author

Flakebi commented Jan 7, 2026

When implementing this, I was searching for existing “compiler-defined” types.
It surprised me that there is none so far.
For functions, there is #[rustc_intrinsic]: One defines a function signature and rustc implements the body.
The type equivalent #[rustc_type](?) is missing: Defining a type “signature”, name and generic arguments, and rustc implementing the type body.

It was mentioned above that one could add a new TyKind ty::AddrspacePtr but that’s kind of a big hammer for a currently rather niche feature.
However, if compiler-implemented types are useful outside of addrspace pointers, maybe a new more generally useful TyKind can be added.
AddrspacePtr can be such an opaque, compiler-implemented type.

Pointers into other address spaces that have different layouts should have their own data layouts given in the data layout string of an LLVM target. I believe we already have some augmentation for the sake of this, it would be a matter of bringing that home.

Just to make sure we’re on the same page, you mean the type layout of pointers as defined in the data-layout string is different, right?
The layout of pointee types is (as far as I’m aware) always the same, independent of the address space, at least LLVM has no way to express that they should be different (as there is only a single data-layout).
(Almost) all of this is already implemented in current rustc. E.g. the avr-none target makes use of this.

For future reference, I found where gcc defines the amdgpu address spaces: https://github.com/gcc-mirror/gcc/blob/7132a4a945579f096b59a59460196f0f44fbe18b/gcc/config/gcn/gcn.h#L577 (not identical to llvm but similar)

@workingjubilee
Copy link
Member

Just to make sure we’re on the same page, you mean the type layout of pointers as defined in the data-layout string is different, right?

Correct.

However, if compiler-implemented types are useful outside of addrspace pointers, maybe a new more generally useful TyKind can be added.

Unsure. I think it might be fine to leave them as ADTs, but extend rustc to be able to recognize that an ADT can have a "layout is an implementation detail" hook and that it isn't simply its recursive fields + tag. I have been working on a fork which does this for a different type: I simply treated the type as a lang_item and added new layout code.

@CAD97
Copy link
Contributor

CAD97 commented Jan 7, 2026

It might not be obvious to everyone (it wasn't to me), but the non-default address spaces are already defined by the target datalayout string, and may be referred to by name (if provided) in addition to the index (which is required).

The most directly relevant part of The OpenCL Specification is:

§3.3.1. Fundamental Memory Regions

Memory in OpenCL is divided into two parts.

  • Host Memory: The memory directly available to the host. The detailed behavior of host memory is defined outside of OpenCL. Memory objects move between the Host and the devices through functions within the OpenCL API or through a shared virtual memory interface.
  • Device Memory: Memory directly available to kernels executing on OpenCL devices.

Device memory consists of four named address spaces or memory regions [LLVM address spaces]:

  • Global Memory: This memory region permits read/write access to all work-items in all work-groups running on any device within a context. Work-items can read from or write to any element of a memory object. Reads and writes to global memory may be cached depending on the capabilities of the device.
  • Constant Memory: A region of global memory that remains constant during the execution of a kernel-instance. The host allocates and initializes memory objects placed into constant memory.
  • Local Memory: A memory region local to a work-group. This memory region can be used to allocate variables that are shared by all work-items in that work-group.
  • Private Memory: A region of memory private to a work-item. Variables defined in one work-items private memory are not visible to another work-item.

The memory regions and their relationship to the OpenCL Platform model are summarized below. Local and private memories are always associated with a particular device. The global and constant memories, however, are shared between all devices within a given context. An OpenCL device may include a cache to support efficient access to these shared memories.

To understand memory in OpenCL, it is important to appreciate the relationships between these named address spaces. The four named address spaces available to a device are disjoint meaning they do not overlap. This is a logical relationship, however, and an implementation may choose to let these disjoint named address spaces share physical memory.

Programmers often need functions callable from kernels where the pointers manipulated by those functions can point to multiple named address spaces. This saves a programmer from the error-prone and wasteful practice of creating multiple copies of functions; one for each named address space. Therefore the global, local and private address spaces belong to a single generic address space. This is closely modeled after the concept of a generic address space used in the embedded C standard (ISO/IEC 9899:1999). Since they all belong to a single generic address space, the following properties are supported for pointers to named address spaces in device memory:

  • A pointer to the generic address space can be cast to a pointer to a global, local or private address space
  • A pointer to a global, local or private address space can be cast to a pointer to the generic address space.
  • A pointer to a global, local or private address space can be implicitly converted to a pointer to the generic address space, but the converse is not allowed.

The constant address space is disjoint from the generic address space.

[!NOTE]
The generic address space is missing before version 2.0.

The addresses of memory associated with memory objects in Global memory are not preserved between kernel instances, between a device and the host, and between devices. In this regard global memory acts as a global pool of memory objects rather than an address space. This restriction is relaxed when shared virtual memory (SVM) is used.

[!NOTE]
Shared virtual memory is missing before version 2.0.

SVM causes addresses to be meaningful between the host and all of the devices within a context hence supporting the use of pointer based data structures in OpenCL kernels. It logically extends a portion of the global memory into the host address space giving work-items access to the host address space. On platforms with hardware support for a shared address space between the host and one or more devices, SVM may also provide a more efficient way to share data between devices and the host. Details about SVM are presented in Shared Virtual Memory.


The generic address space, if supported (not all processors in the target support it), is our *mut _rs. (I.e. ptrll or ptr addrspace(0)ll, which are equivalent. I do not know how LLVM treats ptrll on the processor targets that do not support the generic address space.)

Global, Local, and Private pointers are semantically closer to offset pointers into object pools than memory pointers into an address space. While conversion between a nonzero memory region and a Generic pointer is near-trivial (no-op for Global or Constant; prepending/truncating a 32bit register value prefix for Local or Private), access through the Generic address space is meaningfully more expensive than through the specific address space, if the value can be known to be in that memory region.

The OpenCL Standard, as far as I can tell, does not define the Region region / Global Data Store (GDS) / wavefront-local address space provided by LLVM AMDGPU. (The Local Region / Local Data Store (LDS) is workgroup-local.)


The LLVM overview docs for non-integral pointer types are also relevant here:

Non-Integral Pointer Type

Note: non-integral pointer types are a work in progress, and they should be considered experimental at this time.

For most targets, the pointer representation is a direct mapping from the bitwise representation to the address of the underlying memory location. Such pointers are considered “integral”, and any pointers where the representation is not just an integer address are called “non-integral”.

Non-integral pointers have at least one of the following three properties:

  • the pointer representation contains non-address bits
  • the pointer representation is unstable (may change at any time in a target-specific way)
  • the pointer representation has external state

These properties (or combinations thereof) can be applied to pointers via the datalayout string.

The exact implications of these properties are target-specific. The following subsections describe the IR semantics and restrictions to optimization passes for each of these properties.

Pointers with non-address bits

Pointers in this address space have a bitwise representation that not only has address bits, but also some other target-specific metadata. In most cases pointers with non-address bits behave exactly the same as integral pointers, the only difference is that it is not possible to create a pointer just from an address unless all the non-address bits are also recreated correctly in a target-specific way.

An example of pointers with non-address bits are the AMDGPU buffer descriptors which are 160 bits: a 128-bit fat pointer and a 32-bit offset. Similarly, CHERI capabilities contain a 32- or 64-bit address as well as the same number of metadata bits, but unlike the AMDGPU buffer descriptors they have external state in addition to non-address bits.

Unstable pointer representation

[omitted, irrelevant to this discussion]

Pointers with external state

A further special case of non-integral pointers is ones that include external state (such as bounds information or a type tag) with a target-defined size. An example of such a type is a CHERI capability, where there is an additional validity bit that is part of all pointer-typed registers, but is located in memory at an implementation-defined address separate from the pointer itself. Another example would be a fat-pointer scheme where pointers remain plain integers, but the associated bounds are stored in an out-of-band table.

Unless also marked as “unstable”, the bit-wise representation of pointers with external state is stable and ptrtoint(x) always yields a deterministic value. This means transformation passes are still permitted to insert new ptrtoint instructions.

The following restrictions apply to IR level optimization passes:

The inttoptr instruction does not recreate the external state and therefore it is target dependent whether it can be used to create a dereferenceable pointer. In general passes should assume that the result of such an inttoptr is not dereferenceable. For example, on CHERI targets an inttoptr will yield a capability with the external state (the validity tag bit) set to zero, which will cause any dereference to trap. The ptrtoint instruction also only returns the “in-band” state and omits all external state.

When a store ptr addrspace(N) %p, ptr @dst of such a non-integral pointer is performed, the external metadata is also stored to an implementation-defined location. Similarly, a %val = load ptr addrspace(N), ptr @dst will fetch the external metadata and make it available for all uses of %val. Similarly, the llvm.memcpy and llvm.memmove intrinsics also transfer the external state. This is essential to allow frontends to efficiently emit copies of structures containing such pointers, since expanding all these copies as individual loads and stores would affect compilation speed and inhibit optimizations.

Notionally, these external bits are part of the pointer, but since inttoptr / ptrtoint only operate on the “in-band” bits of the pointer and the external bits are not explicitly exposed, they are not included in the size specified in the datalayout string.

When a pointer type has external state, all roundtrips via memory must be performed as loads and stores of the correct type since stores of other types may not propagate the external data. Therefore it is not legal to convert an existing load/store (or a llvm.memcpy / llvm.memmove intrinsic) of pointer types with external state to a load/store of an integer type with the same bitwidth, as that may drop the external state.


AMDGPU buffer descriptors are here noted to be effectively (ptr128, i32): a pointer with non-address bits. The docs for pointers with external state is also relevant here, even though it uses CHERI as the motivating example, as it basically describes how LLVM handles (NVI) pointer provenance (although only for address spaces with the non-default e flag).

This helps illustrate what the most general approach for Rust opsem for nonzero addrspace pointers would be: they have provenance that is maintained by that-pointer-typed copies and in MaybeUninit, but lost to other-type-typed copies. If that pointer is ever considered dereferencable in Rust, then the provenance is the same as Rust's (and results in the pointer being accessible or not via exactly the same rules); elsewise the provenance's semantics are defined by the external intrinsics that use the pointer, and it cannot be modified by Rust.

@workingjubilee
Copy link
Member

workingjubilee commented Jan 7, 2026

I think that we should start from designs that can represent pointers the language already knows about, and that can capture the fact that on some targets, certain kinds of pointees might differ in terms of their meaningful representation.

I am speaking of data pointers and code pointers. For data pointers, that's of course *mut T, and I think the pointer API for that type is reasonably good.

But for code pointers, I think Rust programmers are under-served. I don't think {safety} extern "{abi}" fn(A) -> R is a good code pointer. It's designed around the assumption that you're going to just call it and anything else... like, say, checking alignment to see if it's a valid code pointer on that architecture?... is a secondary concern and usually requires translating it to another format, such as "a data pointer". That is an impoverished API compared to *mut T.

Genericity would be a nice property to have, but in my mind would not be core-essential, as we can later add a generic abstraction over both.

I think that discussing extension in terms of arbitrary address spaces often starts to short-circuit people's brains... my own, at least... on The Sheer Number Of Possibilities. We start talking about things like "of course pointers into different address spaces are pointers into disjoint domains..." because address spaces in a fully generic case might need to be reasoned about thusly. Or we then say "of course pointers into different address spaces can always be pointers into the same domain" because we reason entirely conservatively about it!

If we wind up overcalibrating our design on something, I'd rather it be "okay, wasm, a target we definitely already care about, has funny function pointers, which we definitely already care about, that don't really alias its conventional data space, but they do on x86-64, a target we definitely already care about, and Rust programmers should be able to write code that works on both x86-64 and wasm". That has already complicated new attempts to extend the language!

That seems a better place to be in, design-wise, than focusing on OpenCL, which is not a target, but rather is a dialect of C designed to provide its own abstractions and for which we have every reason to interrogate its decisions.

For instance, OpenCL appeals to Embedded C for its support of specific address spaces, but then states that generic address space support is optional... which is not the case in Embedded C's definition:

ISO/IEC 9899:1999 stipulates that all objects are allocated in a single, common address space. For the purpose of this Technical Report, the C language is extended to support additional address spaces. When not specified otherwise, objects are allocated by default in a generic address space, which corresponds to the single address space of ISO/IEC 9899:1999. In addition to the generic address space, an implementation may support other, named address spaces. Objects may be allocated in these alternate address spaces, and pointers may be defined that point to objects in these address spaces. It is intended that pointers into an address space only need be large enough to support the range of addresses in that address space.

I realize that for @Flakebi the GPU case is more important, but I think that if we wade too deep into the weeds right away, we won't manage to get over the first hurdle.

@workingjubilee
Copy link
Member

workingjubilee commented Jan 7, 2026

I don't want us to spend lots of time engaging with C (or here, I guess, OpenCL?) Ghost Stories... weird gaps in the Standardese which are justified by only one particular target that technically have to be taken seriously by WG14 or Khronos, but that in practice wind up undersupported by C implementations outside some company's private toolchain which implements a weird dialect of the language at best.

In some cases, like the "va_end must be called in the same scope" ghost story, it's because of one target (Pyramid Technologies) that's been dead for literally 30 years. Like dead-dead. Like "at least you can get a DEC Alpha and even boot ancient Windows on it, because it was useful enough that those who bought the DEC Alpha's intellectual property kept making it for a few years, whereas even Pyramid was killing its own architecture off before they died" dead.

That doesn't mean we need to ignore OpenCL where it's useful. The way another language has approached these abstractions can be informative. But if someone has a relict GPU that doesn't support a generic memory space at all yet wants to make it a Rust target, we can simply tell them to take a hike instead of entertaining them.

@RalfJung
Copy link
Member

RalfJung commented Jan 8, 2026

However, if compiler-implemented types are useful outside of addrspace pointers, maybe a new more generally useful TyKind can be added.

That's the question, isn't it -- are there more such niche primitive types which we want to expose without giving them native syntax? Given how wide-spread the ramifications of a type's existence are (going far beyond a regular intrinsic), I am not sure it's a good idea to have such "intrinsic types".

It would be really nice to hear from a t-types person here... let me bring this up on their Zulip.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jan 8, 2026
…ubilee

Add amdgpu_dispatch_ptr intrinsic

There is an ongoing discussion in rust-lang#150452 about using address spaces from the Rust language in some way.
As that discussion will likely not conclude soon, this PR adds one rustc_intrinsic with an addrspacecast to unblock getting basic information like launch and workgroup size and make it possible to implement something like `core::gpu`.

Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the launch size and workgroup size.

The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to `addrspace(0)`, so it can be returned as a Rust reference.
The returned pointer/reference is valid for the whole program lifetime, and is therefore `'static`.
The return type of the intrinsic (`&'static ()`) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function.
Is this ok or is there a better way (also, should it return a pointer instead of a reference)?

Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```

Tracking issue: rust-lang#135024

r? RalfJung as you are already aware of the background (feel free to re-assign)
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jan 8, 2026
…ubilee

Add amdgpu_dispatch_ptr intrinsic

There is an ongoing discussion in rust-lang#150452 about using address spaces from the Rust language in some way.
As that discussion will likely not conclude soon, this PR adds one rustc_intrinsic with an addrspacecast to unblock getting basic information like launch and workgroup size and make it possible to implement something like `core::gpu`.

Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the launch size and workgroup size.

The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to `addrspace(0)`, so it can be returned as a Rust reference.
The returned pointer/reference is valid for the whole program lifetime, and is therefore `'static`.
The return type of the intrinsic (`&'static ()`) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function.
Is this ok or is there a better way (also, should it return a pointer instead of a reference)?

Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```

Tracking issue: rust-lang#135024

r? RalfJung as you are already aware of the background (feel free to re-assign)
@comex
Copy link
Contributor

comex commented Jan 8, 2026

That's the question, isn't it -- are there more such niche primitive types which we want to expose without giving them native syntax? Given how wide-spread the ramifications of a type's existence are (going far beyond a regular intrinsic), I am not sure it's a good idea to have such "intrinsic types".

Two potential candidates that come to mind are ARM SVE variable-length vectors and WebAssembly externrefs. Similar to CAD97's "Pointers with external state" category, these are target-specific types that are difficult to load and store to memory. Though, as you know, externrefs are difficult enough that it may be a non-goal for Rust to support them.

@RalfJung
Copy link
Member

RalfJung commented Jan 8, 2026

The types we are talking about in this PR are not difficult to load and store, are they? They are normal types in that regard.

Scalable vectors are a new sort of primitive type indeed. IIRC the current proposals treat them similar to the existing fixed-length SIMD vectors. That doesn't involve fully replacing the fields with something else, but arguably the field type is a bit of a stretch for those types as well -- fair.

@comex
Copy link
Contributor

comex commented Jan 9, 2026

Well, the LLVM LangRef, under Non-Integral Pointer Type -> Pointers with external state*, says:

When a store ptr addrspace(N) %p, ptr @dst of such a non-integral pointer is performed, the external metadata is also stored to an implementation-defined location. Similarly, a %val = load ptr addrspace(N), ptr @dst will fetch the external metadata and make it available for all uses of %val. Similarly, the llvm.memcpy and llvm.memmove intrinsics also transfer the external state.

Which sounds difficult in some sense.

* I previously misattributed this to CAD97 because I misread and thought the text behind <details> in their last comment was written by them rather than a quote from the LangRef.

@RalfJung
Copy link
Member

RalfJung commented Jan 9, 2026

Ah, yeah those are truly cursed and there will always be valid Rust code that doesn't run on them, i.e. we need some sort of scheme for crates to opt-in to indicate support for them (similar to CHERI).

But the GPU address spaces we are talking about do not have such external state, do they?

@Flakebi
Copy link
Contributor Author

Flakebi commented Jan 9, 2026

But the GPU address spaces we are talking about do not have such external state, do they?

Correct, this wording is only for CHERI.
The non-integral amdgpu pointers only have inttoptr (add x, (ptrtoint ptr)) != getelementptr i8, ptr, x as the pointer contains metadata like size for out-of-bounds checks and an integer add could overflow into the metadata part. A getelementptr adds only to the offset part of the pointer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-test-infra-minicore Area: `minicore` test auxiliary and `//@ add-core-stubs` S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.