-
-
Notifications
You must be signed in to change notification settings - Fork 14.3k
Add AddrspacePtr for pointers to non-0 addrspaces #150452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
A new lang item `addrspace_ptr_type` is added. It must be a `struct AddrspacePtr<T: 'static, const ADDRSPACE: u32>`. The content of this struct as written in the Rust source is ignored when computing the type layout. It is replaced by a pointer into address space `ADDRSPACE`. As the content of the struct is replaced inside the compiler, it cannot be “looked at” in the Rust source. To do something with the type (more than just holding it), rustc_intrinsics are used. Intrinsics are added to to - cast between `AddrspacePtr<T, addrspace::GENERIC>` and `*mut T`, this is just syntactical type conversion - cast to a different type `T` (a no-op, just type conversion) or cast to a different address space (`addrspacecast` in llvm) - convert to an integer (`ptrtoint` in llvm) - add to the offset (`ptr::[wrapping_]offset`, `getelementptr` in llvm) - and read/write from the pointer (`load/store` in llvm) This should satisfy all basic needs for operations on pointers. It is disallowed to deref an `AddrspacePtr` directly in Rust, read/write should be used instead. Rustc still uses deref internally to implement read/write.
|
cc @rust-lang/wg-const-eval This PR modifies cc @jieyouxu Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt Some changes occurred to the intrinsics. Make sure the CTFE / Miri interpreter |
|
The job Click to see the possible cause of the failure (guessed by this bot) |
|
The approach looks reasonable to me, but I'm not that familiar with this part of the compiler and I'm not on the reviewers list, so r? compiler |
|
This seems to be a new language feature, so I would expect a bit more time spent on figuring out a proper design before we add this to the compiler. It may need an RFC. At the very least, we should have a description of what this type does that does not mention LLVM at all -- LLVM is an implementation detail of Rust, not something users should be exposed to, and it is not the only codegen backend. |
| #[unstable(feature = "ptr_addrspace", issue = "none")] | ||
| #[lang = "addrspace_ptr_type"] | ||
| #[allow(missing_debug_implementations)] | ||
| pub struct AddrspacePtr<T: 'static, const ADDRSPACE: u32> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is big enough. Please move the new things to a new file.
| #[unstable(feature = "ptr_addrspace", issue = "none")] | ||
| #[cfg(any(doc, target_arch = "amdgpu", target_arch = "nvptx64"))] | ||
| #[doc(cfg(any(target_arch = "amdgpu", target_arch = "nvptx64")))] | ||
| pub const CONST: u32 = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these random constants, where do they come from? If they are LLVM-specific, then they should not show up here -- after all, what should the cranelift/GCC backend or Miri do with them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're essentially arbitrary identifiers for identifying what address space a nonlocal pointer points into.
The most correct representation would probably be a #[non_exhaustive] enum, but I don't know how difficult that would be to introspect in the compiler compared to a simple const generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arbitrary identifiers defined by who? Who is the receiver that interprets these numbers? Are they fixed by the target ABI or so, or is it just something internal to the LLVM codegen backends for these targets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The names are defined by the OpenCL standard, "with some additions." The numbers "correspond to target architecture specific LLVM address space numbers used in LLVM IR" and are defined by the AMDGPU target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OpenCL standard specifically talks about LLVM...?!? Aren't there non-LLVM implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you said names, not numbers. The numbers are not in the standard, just in LLVM.
That makes it likely a bad idea to have them in library...
|
FWIW, I believe this is mostly intended to be used internally for the Rust-GPU work, rather than stabilization track. This feature reduced to absurdity is "what do we need to link to LLVM GPU intrinsics" and not intended for use beyond that. At least currently, I don't think any target other than GPU targets and maybe WASM have any use for nonzero Is there an @ group to ping for the GPU target? |
| /// | ||
| /// # Safety | ||
| /// | ||
| /// If the targets raw pointer address space matches the generic address space, this function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the function returns a pointer in the generic address space, so why this condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pointers in Rust are mostly in the generic address space, but there are exceptions.
For example with the avr-none target, *const u32 would be in the generic address space (so, no problem to convert),
but *const fn() is in address space 1 (due to the P1 in the data layout), therefore converting it to the generic address space involves an address space cast.
If we say this must always be possible (well, one can write *const fn() as *const u32 in safe Rust and it results in an addrspacecast), I can remove this safety comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! Maybe the comment could be clarified to "If the target's address space for this pointer type (e.g. data pointer vs function pointer) matches the generic address space ..."
|
Thanks for taking a look!
Exactly that 😊 I want to access As a stop-gap solution in the When looking at it again this year – also keeping in mind that there are more intrinsics than dispatch.ptr that use addrspaces – I tried to find a generic way that doesn’t need changes in rustc for every affected intrinsic. So, I’m happy with any solution that lets me call I think a generic How do you prefer Rust to call |
|
I think the main point is that whatever you add to From the OpenCL docs, it seems like the concept of having multiple address spaces is not LLVM-specific. But these specific numbers you used are. So what about having an We should also somewhere have documentation of what these address spaces even are. From the little I remember about LLVM address spaces, they have a peculiar way of using the term that's somewhat distinct from what one might expect. (For instance, I would by default expect every address space to be entirely independent of all the others, referring to its own disjoint domain of things. That makes the operation of casting between address spaces entirely meaningless. But apparently LLVM address spaces are something different, not about the meaning of the address but just about the exact instruction used to compile loads and stores? No idea why that's a part of the type rather than a flag at the load/store... I am probably misunderstanding something. But you get the idea. If you add new fundamental language concepts to Rust, we have to make sure they can be understood by all core stakeholders that relate to Rust semantics, including those with no background in GPU programming. It doesn't matter that libcore and libstd are the only intended direct users of that feature -- that means we can skip the syntax discussion, but there's no shortcut to properly defining the semantics of everything libcore uses.) Note in particular that rustc needs to know the size and alignment of all its types without going through LLVM. For all these address spaces, everything relevant about them should be fully defined and documented inside Rust. There should be enough information such that supporting those address spaces in Miri is "just" a matter of implementation, not a matter of figuring out what exactly each operation even does. |
Just call it and then cast the result to address space 0? A Rust intrinsic can expand to multiple LLVM operations. So far I don't understand why non-default address spaces ever need to be visible in the Rust code. But I don't really understand address spaces so that is not surprising. ;) The PR seems to assume the reader already knows LLVM's notion of address spaces and has accepted it as necessity. EDIT: Ah you already talked about that...
Directly calling LLVM intrinsics is a last resort and a big headache for everyone else. |
| pub struct AddrspacePtr<T: 'static, const ADDRSPACE: u32> { | ||
| // Struct implementation is replaced by the compiler. | ||
| // This field is here for using the generic argument but cannot be set or accessed in any way. | ||
| do_not_use: PhantomData<*const T>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a very fragile hack -- having fields that then get removed half-way through compilation. There's a huge risk of confusing some part of the compiler that still looks at those fields. Not even Box does something this cursed, and that type is already quite cursed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your sentiment, I thought the same when I initially tried to get it working.
It is similar to #[rustc_intrinsic]s with a dummy/fallback body that gets replaced during compilation.
I would argue that while fallback intrinsic bodies have the risk of getting inlined and therefore silently mis-compile a program, the dummy struct body here will always fail compilation if used so it’s actually safer than the intrinsics that rustc already uses.
Rustc so far only replaced function bodies, replacing a type body is something new, therefore I explicitly called this out in the PR description (in Somewhat Open Points).
Note in particular that rustc needs to know the size and alignment of all its types without going through LLVM.
This works perfectly fine (see also the test I wrote in the PR), Rust knows the correct size and alignment because we replace the type layout of AddrspacePtr with a valid layout of a pointer to the specified address space.
Every access to the struct goes through the type layout, that’s where it fails compilation if some code tries to access the body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reading my comment, I want to clarify it, so it doesn’t come across wrongly: I wanted to express that I think replacing the struct body as implemented here is not as bad as it looks at a first glance.
There are probably many ways to improve it, make it look better and more robust. But I didn’t find the better places to modify so far. So for everyone who reads this comment, if you know more about the Rust compiler than I do (well, that’s a given ;)) and you have concrete or not-so-concrete ideas for improvement, let me hear them!
So, you prefer having a I’ll try to answer this question anyway:
As far as I know, LLVM doesn’t really define what an address space is or how one is supposed to use it. It makes different pointer types with e.g. different sizes, but that’s about it. |
Well, it's the "obvious" alternative. It's hard for me to judge the trade-offs here as I don't have an overview over the design space. :) That's why it would help to have a bit of a design document for what the desired goal here looks like -- are we talking about a single intrinsic, or a dozen, or hundreds, etc? |
Counting intrinsics (IR/IntrinsicsAMDGPU.td and IR/IntrinsicsNVVM.td in llvm) that use non-0 address spaces, there are about 80 (a few more that I skipped because they’re even more special purpose; plus I didn’t count non-intrinsics/normal operations on these address spaces like load/store/atomics). The more important ones, e.g. necessary to implement a useful std/core is about a dozen intrinsics. |
|
For my 2¢, I do think it makes sense for the implementation primitive to be generic over the address space. But I also think it potentially makes more sense for the type A lang-item That what addrspaces are available is cfg dependent also makes me lean more towards a library generic than a primitive per addrspace. Changing the pointer repr based on pointee type à la fat pointers is perhaps the most fitting design, but alternative pointer reprs that aren't just Although, all that said, I have a bit of a bias towards presupposing that the compiler "should" support arbitrary addrspaces in a uniform manner instead of each one directly. If that predicate is incorrect, my thoughts here become moot. |
|
This appears to be firmly in compiler territory, which I'm not a reviewer for. r? compiler |
|
I don't expect this convo to go anywhere at the moment, I'm just taking this off the hot potato cycle while we figure out what we even want to do in this arena of thought. |
@CAD97 Mm, I feel that's not very coherent here, if I'm understanding you correctly? Maybe I'm not? Because it's not clear to me at all that we would determine the overall layout of the pointer based on the addrspace of the pointee. For instance, a |
|
@workingjubilee In a world where addrspace was set by pointee type, the data part of the pointer (i.e. |
|
Hmm. Well I definitely don't see why we couldn't have ...dyn Trait may be stressing a lot of things though yeah. |
I deliberately did not propose to extend the existing raw pointers with an addrspace parameter. But the "struct whose fields are replaced" is a weird hack that leads to a compiler with a split mind, where depending on which APIs are used one sees entirely different fields. This is the kind of design that leads to lots of hair-pulling, confusion, and cursing down the line when people who are not in this discussion (or who forgot about it) try to figure out what on Earth is going on. Also, from a t-types and t-opsem perspective, having the type "only for internal use" simplifies absolutely nothing. We still have to fully specify how it is supposed to work and ensure it behaves correctly. |
|
Yeees, I think it's better to just embed "the ADT has a special layout" as, er, "the ADT has a special layout", rather than having magical disappearing/reappearing fields. |
|
We don't really have "ADTs whose size is determined by the backend" though. I'm not sure what best way is to represent such types. One could imagine something like #[cfg(amdgpu)]
#[repr(addrspace(...))]
struct LocalPtr(...);but then the library would still have to know the right size and alignment for the field. The only types we have where size and alignment are entirely determined by the target are primitive types: usize, pointers. |
…ubilee Add amdgpu_dispatch_ptr intrinsic There is an ongoing discussion in rust-lang#150452 about using address spaces from the Rust language in some way. As that discussion will likely not conclude soon, this PR adds one rustc_intrinsic with an addrspacecast to unblock getting basic information like launch and workgroup size and make it possible to implement something like `core::gpu`. Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel dispatch packet on amdgpu. The HSA kernel dispatch packet contains important information like the launch size and workgroup size. The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to `addrspace(0)`, so it can be returned as a Rust reference. The returned pointer/reference is valid for the whole program lifetime, and is therefore `'static`. The return type of the intrinsic (`&'static ()`) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function. Is this ok or is there a better way (also, should it return a pointer instead of a reference)? Short version: ```rust #[cfg(target_arch = "amdgpu")] pub fn amdgpu_dispatch_ptr() -> *const (); ``` Tracking issue: rust-lang#135024 r? RalfJung as you are already aware of the background (feel free to re-assign)
|
Pointers into other address spaces that have different layouts should have their own data layouts given in the data layout string of an LLVM target. I believe we already have some augmentation for the sake of this, it would be a matter of bringing that home. |
|
Yes, that's the "easy" part. The hard part is, how are those pointer types exposed in the language? |
|
When implementing this, I was searching for existing “compiler-defined” types. It was mentioned above that one could add a new TyKind
Just to make sure we’re on the same page, you mean the type layout of pointers as defined in the data-layout string is different, right? For future reference, I found where gcc defines the amdgpu address spaces: https://github.com/gcc-mirror/gcc/blob/7132a4a945579f096b59a59460196f0f44fbe18b/gcc/config/gcn/gcn.h#L577 (not identical to llvm but similar) |
Correct.
Unsure. I think it might be fine to leave them as ADTs, but extend rustc to be able to recognize that an ADT can have a "layout is an implementation detail" hook and that it isn't simply its recursive fields + tag. I have been working on a fork which does this for a different type: I simply treated the type as a |
|
It might not be obvious to everyone (it wasn't to me), but the non-default address spaces are already defined by the target datalayout string, and may be referred to by name (if provided) in addition to the index (which is required). The most directly relevant part of The OpenCL Specification is: §3.3.1. Fundamental Memory RegionsMemory in OpenCL is divided into two parts.
Device memory consists of four named address spaces or memory regions [LLVM address spaces]:
The memory regions and their relationship to the OpenCL Platform model are summarized below. Local and private memories are always associated with a particular device. The global and constant memories, however, are shared between all devices within a given context. An OpenCL device may include a cache to support efficient access to these shared memories. To understand memory in OpenCL, it is important to appreciate the relationships between these named address spaces. The four named address spaces available to a device are disjoint meaning they do not overlap. This is a logical relationship, however, and an implementation may choose to let these disjoint named address spaces share physical memory. Programmers often need functions callable from kernels where the pointers manipulated by those functions can point to multiple named address spaces. This saves a programmer from the error-prone and wasteful practice of creating multiple copies of functions; one for each named address space. Therefore the global, local and private address spaces belong to a single generic address space. This is closely modeled after the concept of a generic address space used in the embedded C standard (ISO/IEC 9899:1999). Since they all belong to a single generic address space, the following properties are supported for pointers to named address spaces in device memory:
The constant address space is disjoint from the generic address space.
The addresses of memory associated with memory objects in Global memory are not preserved between kernel instances, between a device and the host, and between devices. In this regard global memory acts as a global pool of memory objects rather than an address space. This restriction is relaxed when shared virtual memory (SVM) is used.
SVM causes addresses to be meaningful between the host and all of the devices within a context hence supporting the use of pointer based data structures in OpenCL kernels. It logically extends a portion of the global memory into the host address space giving work-items access to the host address space. On platforms with hardware support for a shared address space between the host and one or more devices, SVM may also provide a more efficient way to share data between devices and the host. Details about SVM are presented in Shared Virtual Memory. The generic address space, if supported (not all processors in the target support it), is our Global, Local, and Private pointers are semantically closer to offset pointers into object pools than memory pointers into an address space. While conversion between a nonzero memory region and a Generic pointer is near-trivial (no-op for Global or Constant; prepending/truncating a 32bit register value prefix for Local or Private), access through the Generic address space is meaningfully more expensive than through the specific address space, if the value can be known to be in that memory region. The OpenCL Standard, as far as I can tell, does not define the Region region / Global Data Store (GDS) / wavefront-local address space provided by LLVM AMDGPU. (The Local Region / Local Data Store (LDS) is workgroup-local.) The LLVM overview docs for non-integral pointer types are also relevant here: Non-Integral Pointer TypeNote: non-integral pointer types are a work in progress, and they should be considered experimental at this time. For most targets, the pointer representation is a direct mapping from the bitwise representation to the address of the underlying memory location. Such pointers are considered “integral”, and any pointers where the representation is not just an integer address are called “non-integral”. Non-integral pointers have at least one of the following three properties:
These properties (or combinations thereof) can be applied to pointers via the datalayout string. The exact implications of these properties are target-specific. The following subsections describe the IR semantics and restrictions to optimization passes for each of these properties. Pointers with non-address bitsPointers in this address space have a bitwise representation that not only has address bits, but also some other target-specific metadata. In most cases pointers with non-address bits behave exactly the same as integral pointers, the only difference is that it is not possible to create a pointer just from an address unless all the non-address bits are also recreated correctly in a target-specific way. An example of pointers with non-address bits are the AMDGPU buffer descriptors which are 160 bits: a 128-bit fat pointer and a 32-bit offset. Similarly, CHERI capabilities contain a 32- or 64-bit address as well as the same number of metadata bits, but unlike the AMDGPU buffer descriptors they have external state in addition to non-address bits. Unstable pointer representation[omitted, irrelevant to this discussion] Pointers with external stateA further special case of non-integral pointers is ones that include external state (such as bounds information or a type tag) with a target-defined size. An example of such a type is a CHERI capability, where there is an additional validity bit that is part of all pointer-typed registers, but is located in memory at an implementation-defined address separate from the pointer itself. Another example would be a fat-pointer scheme where pointers remain plain integers, but the associated bounds are stored in an out-of-band table. Unless also marked as “unstable”, the bit-wise representation of pointers with external state is stable and The following restrictions apply to IR level optimization passes: The When a Notionally, these external bits are part of the pointer, but since When a pointer type has external state, all roundtrips via memory must be performed as loads and stores of the correct type since stores of other types may not propagate the external data. Therefore it is not legal to convert an existing load/store (or a AMDGPU buffer descriptors are here noted to be effectively This helps illustrate what the most general approach for Rust opsem for nonzero addrspace pointers would be: they have provenance that is maintained by that-pointer-typed copies and in |
|
I think that we should start from designs that can represent pointers the language already knows about, and that can capture the fact that on some targets, certain kinds of pointees might differ in terms of their meaningful representation. I am speaking of data pointers and code pointers. For data pointers, that's of course But for code pointers, I think Rust programmers are under-served. I don't think Genericity would be a nice property to have, but in my mind would not be core-essential, as we can later add a generic abstraction over both. I think that discussing extension in terms of arbitrary address spaces often starts to short-circuit people's brains... my own, at least... on The Sheer Number Of Possibilities. We start talking about things like "of course pointers into different address spaces are pointers into disjoint domains..." because address spaces in a fully generic case might need to be reasoned about thusly. Or we then say "of course pointers into different address spaces can always be pointers into the same domain" because we reason entirely conservatively about it! If we wind up overcalibrating our design on something, I'd rather it be "okay, wasm, a target we definitely already care about, has funny function pointers, which we definitely already care about, that don't really alias its conventional data space, but they do on x86-64, a target we definitely already care about, and Rust programmers should be able to write code that works on both x86-64 and wasm". That has already complicated new attempts to extend the language! That seems a better place to be in, design-wise, than focusing on OpenCL, which is not a target, but rather is a dialect of C designed to provide its own abstractions and for which we have every reason to interrogate its decisions. For instance, OpenCL appeals to Embedded C for its support of specific address spaces, but then states that generic address space support is optional... which is not the case in Embedded C's definition:
I realize that for @Flakebi the GPU case is more important, but I think that if we wade too deep into the weeds right away, we won't manage to get over the first hurdle. |
|
I don't want us to spend lots of time engaging with C (or here, I guess, OpenCL?) Ghost Stories... weird gaps in the Standardese which are justified by only one particular target that technically have to be taken seriously by WG14 or Khronos, but that in practice wind up undersupported by C implementations outside some company's private toolchain which implements a weird dialect of the language at best. In some cases, like the " That doesn't mean we need to ignore OpenCL where it's useful. The way another language has approached these abstractions can be informative. But if someone has a relict GPU that doesn't support a generic memory space at all yet wants to make it a Rust target, we can simply tell them to take a hike instead of entertaining them. |
That's the question, isn't it -- are there more such niche primitive types which we want to expose without giving them native syntax? Given how wide-spread the ramifications of a type's existence are (going far beyond a regular intrinsic), I am not sure it's a good idea to have such "intrinsic types". It would be really nice to hear from a t-types person here... let me bring this up on their Zulip. |
…ubilee Add amdgpu_dispatch_ptr intrinsic There is an ongoing discussion in rust-lang#150452 about using address spaces from the Rust language in some way. As that discussion will likely not conclude soon, this PR adds one rustc_intrinsic with an addrspacecast to unblock getting basic information like launch and workgroup size and make it possible to implement something like `core::gpu`. Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel dispatch packet on amdgpu. The HSA kernel dispatch packet contains important information like the launch size and workgroup size. The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to `addrspace(0)`, so it can be returned as a Rust reference. The returned pointer/reference is valid for the whole program lifetime, and is therefore `'static`. The return type of the intrinsic (`&'static ()`) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function. Is this ok or is there a better way (also, should it return a pointer instead of a reference)? Short version: ```rust #[cfg(target_arch = "amdgpu")] pub fn amdgpu_dispatch_ptr() -> *const (); ``` Tracking issue: rust-lang#135024 r? RalfJung as you are already aware of the background (feel free to re-assign)
…ubilee Add amdgpu_dispatch_ptr intrinsic There is an ongoing discussion in rust-lang#150452 about using address spaces from the Rust language in some way. As that discussion will likely not conclude soon, this PR adds one rustc_intrinsic with an addrspacecast to unblock getting basic information like launch and workgroup size and make it possible to implement something like `core::gpu`. Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel dispatch packet on amdgpu. The HSA kernel dispatch packet contains important information like the launch size and workgroup size. The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to `addrspace(0)`, so it can be returned as a Rust reference. The returned pointer/reference is valid for the whole program lifetime, and is therefore `'static`. The return type of the intrinsic (`&'static ()`) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function. Is this ok or is there a better way (also, should it return a pointer instead of a reference)? Short version: ```rust #[cfg(target_arch = "amdgpu")] pub fn amdgpu_dispatch_ptr() -> *const (); ``` Tracking issue: rust-lang#135024 r? RalfJung as you are already aware of the background (feel free to re-assign)
Two potential candidates that come to mind are ARM SVE variable-length vectors and WebAssembly externrefs. Similar to CAD97's "Pointers with external state" category, these are target-specific types that are difficult to load and store to memory. Though, as you know, externrefs are difficult enough that it may be a non-goal for Rust to support them. |
|
The types we are talking about in this PR are not difficult to load and store, are they? They are normal types in that regard. Scalable vectors are a new sort of primitive type indeed. IIRC the current proposals treat them similar to the existing fixed-length SIMD vectors. That doesn't involve fully replacing the fields with something else, but arguably the field type is a bit of a stretch for those types as well -- fair. |
|
Well, the LLVM LangRef, under Non-Integral Pointer Type -> Pointers with external state*, says:
Which sounds difficult in some sense. * I previously misattributed this to CAD97 because I misread and thought the text behind |
|
Ah, yeah those are truly cursed and there will always be valid Rust code that doesn't run on them, i.e. we need some sort of scheme for crates to opt-in to indicate support for them (similar to CHERI). But the GPU address spaces we are talking about do not have such external state, do they? |
Correct, this wording is only for CHERI. |
Summary
This is my proposed implementation for exposing address spaces from rustc to
coreand nightly Rust.It adds a
struct AddrspacePtr<T: 'static, const ADDRSPACE: u32>to represent pointers into different address spaces.This struct translates directly into
ptr addrspace(ADDRSPACE)in llvm, so can be used with llvm intrinsics and more.Details
Discussion on Zulip: https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/Adding.20support.20for.20non-0.20address.20spaces.20in.20Rust/with/565558602
Related discussion: https://internals.rust-lang.org/t/naming-gpu-things-in-the-rust-compiler-and-standard-library/23833/6
What is an address space?
Some hardware, e.g. GPUs, have different physical memory regions.
To access them, the compiler uses different types of pointers.
On the hardware side, this results in different instructions being used to access each address space.
The different types of pointers have different properties:
Why expose address spaces in Rust?
Access to address spaces is necessary to implement parts of the standard library (for some targets, not for x86 obviously 😉).
Concretely, accessing the launch and workgroup size on amdgpu needs to call the
llvm.amdgcn.dispatch.ptrintrinsic, which returns aptr addrspace(4).Right now, this return type – and therefore the intrinsic – is not representable in Rust.
Alternatives
Option 1: It is possible to implement most amdgpu intrinsic support without general address space support.
Needed intrinsics could be implemented as
#[rustc_intrinsic], where the backend calls the llvm intrinsic and then does anaddrspacecastto the generic/defaultaddrspace(0), so a pointer that is representable in Rust is returned.Needing a rustc_intrinsic for every llvm intrinsic that uses a non-generic address space does not scale that well though.
It would also be hard (or in some cases impossible?) to work with pointers that cannot be converted back and forth to
addrspace(0)and are non-integral (cannot be converted to/from integers either), likeaddrspace(7)in amdgpu.Option 2: In the other direction, make all pointers have an address space, instead of introducing a separate, new type.
A type layout already supports pointers with address spaces (this is also used for AddrspacePtr), so extending the Rust type would be a natural extension (maybe even doing the same for references as well).
oli-obk mentioned that macros in type position could be used to write down the type without needing new syntax.
It adds more to the implementation, as every occurrence of
ty::RawPtrneeds to be adjusted for a new parameter.If we want to expose (raw) address spaces to Rust users in stable, this is the way to go. If we do not want to expose address spaces in their raw form (or we are undecided) and only use them as implementation detail in std, I think we are better of with a more contained implementation like
AddrspacePtrthat does not leak into every raw pointer type.Design
My expectation is that apart from enabling core features for GPUs (or potentially other targets), explicit addrspaces will not be a much used feature.
I.e. it will be used by
core,stdand maybe low-level libraries, but it won’t see much direct use by Rust’s users.Therefore this PR does not try to introduce new syntax to the language but uses a struct as a pointer and const generics for the address space.
I also did not replicate all of the
ptrhelpers for that reason. When there is a use-case, more can be added.In theory, the existing
ptrintrinsics (offset/read/write/…) could be re-implemented in core using the addrspace_ptr intrinsics, but that would cause (slight) overhead for the common path (always needs a conversion from ptr to addrspace_ptr), therefore I didn’t try it.There is just one
AddrspacePtrtype for const/mut pointers for simplicity. (I assume that raw pointers are opaque in the way that it is valid behavior to e.g. convert aconst &referenceto a*mut ptrand back, as long as the memory behind the*mut ptris only read but not written?)Functionality that exists on raw pointers but is not implemented on
AddrspacePtrhere isread/write_volatile,copy(_nonoverlapping)and atomic operations for now (I’m mentioning these just in case someone sees potential problems with the current design and adding these in the future).Just to mention it for completeness, something related to
AddrspacePtrthat we would like to have in the future is the ability to declare static variables in a certain address space (for GPU workgroup memory).Implementation
A new lang item
addrspace_ptr_typeis added. It must be astruct AddrspacePtr<T: 'static, const ADDRSPACE: u32>.The content of this struct as written in the Rust source is ignored when computing the type layout. It is replaced by a pointer into address space
ADDRSPACE.As the content of the struct is replaced inside the compiler, the content cannot be “looked at” in the Rust source.
To do something with the type (more than just holding it), rustc_intrinsics are used.
Intrinsics are added to to
AddrspacePtr<T, addrspace::GENERIC>and*mut T, this is just syntactical type conversionT(a no-op, just type conversion) or cast to a different address space (addrspacecastin llvm)ptrtointin llvm)ptr::[wrapping_]offset,getelementptrin llvm)load/storein llvm)This should satisfy all basic needs for operations on pointers.
It is disallowed to deref an
AddrspacePtrdirectly in Rust, read/write should be used instead.Rustc still uses deref internally to implement read/write, this mirrors raw pointers.
Somewhat Open Points
Somewhat new introductions where I am unsure if there’s a better way:
core::ptr::addrspace, besidesaddrspace::GENERIC; so farcore::ptrwas target-agnosticAddrspacePtris (to my knowledge) the first type that has its content/layout defined inside the compilerT: 'static(and an implicitSized), should it be something different?r? @CAD97 (assigning you because we already discussed in the Discourse post, feel free to re-assign)