From c443df48ddf1e6c02dad323e866588c396ec0dfc Mon Sep 17 00:00:00 2001 From: Jamie Cunliffe Date: Wed, 18 May 2022 12:38:23 +0100 Subject: [PATCH 1/5] Add a scalable representation to allow support for scalable vectors. --- text/0000-repr-scalable.md | 124 +++++++++++++++++++++++++++++++++++++ 1 file changed, 124 insertions(+) create mode 100644 text/0000-repr-scalable.md diff --git a/text/0000-repr-scalable.md b/text/0000-repr-scalable.md new file mode 100644 index 00000000000..3edbdfd5164 --- /dev/null +++ b/text/0000-repr-scalable.md @@ -0,0 +1,124 @@ +- Feature Name: repr_scalable +- Start Date: (fill me in with today's date, YYYY-MM-DD) +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + +Expanding the SIMD functionality to allow for runtime determined vector lengths. + +# Motivation +[motivation]: #motivation + +Without some support in the compiler it would be impossible to use the +[ACLE](https://developer.arm.com/architectures/system-architectures/software-standards/acle) +[SVE](https://developer.arm.com/documentation/102476/latest/) intrinsics from Arm. + +This RFC will focus on the Arm vector extensions, and will use them for all examples. A large amount of what this +RFC covers is emitting the vscale attribute from LLVM, therefore other scalable vector extensions should work. +In an LLVM developer meeting it was mentioned that RISC-V would use what's accepted for Arm SVE for their vector extensions. +\[[see slide 17](https://llvm.org/devmtg/2019-04/slides/TechTalk-Kruppe-Espasa-RISC-V_Vectors_and_LLVM.pdf)\] + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +This is mostly an extension to [RFC 1199 SIMD Infrastructure](https://rust-lang.github.io/rfcs/1199-simd-infrastructure.html). +An understanding of that is expected from the reader of this. In addition to that, a basic understanding of +[Arm SVE](https://developer.arm.com/documentation/102476/latest/) is assumed. + +Existing SIMD types are tagged with a `repr(simd)` and contain an array or multiple fields to represent the size of the +vector. Given that SVE registers don't have a size known at compile time, they can be +represented as a ZST. Therefore to represent an SVE type, we should add an additional `repr()` to say it is scalable +and have a type marker to specify the element type. + +This RFC is proposing adding an additional representation, `scalable`, that accepts an integer to determine the number of +elements per granule. See the definitions in [the reference-level explanation](#reference-level-explanation) for more information. + +e.g. for a scalable vector f32 type the following could be its representation: + +```rust +#[repr(simd, scalable(4))] +#[derive(Clone, Copy)] +pub struct svfloat32_t { + _ty: [f32; 0], +} +``` +`_ty` is purely a type marker, used to get the element type for the LLVM backend. + + +A simple example that an end user would be able to write for summing of two arrays using functions from the ACLE +for SVE is shown below: +```rust +unsafe { + let step = svcntw() as usize; + for i in (0..SIZE).step_by(step) { + let a = data_a.as_ptr().add(i); + let b = data_b.as_ptr().add(i); + let c = &mut data_c as *mut f32; + let c = c.add(i); + + let pred = svwhilelt_b32(i as _, SIZE as _); + let sva = svld1_f32(pred, a); + let svb = svld1_f32(pred, b); + let svc = svadd_f32_m(pred, sva, svb); + + svst1_f32(svc, pred, c); + } +} +``` +As can be seen by that example the end user wouldn't necessarily interact directly with the changes that are +proposed by this RFC, but might use types and functions that depend on them. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +This will focus on LLVM. No investigation has been done into the alternative codegen back ends. At the time of +writing I believe cranelift doesn't support scalable vectors ([current proposal](https://github.com/bytecodealliance/rfcs/pull/19)), +and the GCC backend is not mature enough to be thinking about this. + +Most of the complexity of SVE will be handled by LLVM and the `vscale` modifier that is applied to vector types. Therefore +changes for this should be fairly minimal for Rust. From the LLVM side this is as simple as calling `LLVMScalableVectorType` +rather than `LLVMVectorType`. + +For a Scalable Vector Type LLVM takes the form ``. +* `elements` multiplied by sizeof(`type`) gives the smallest allowed register size and the increment size. +* `vscale` is a run time constant that is used to determine the actual vector register size. + +For example, with Arm SVE the scalable vector register (Z register) size has to be a multiple of 128 bits, therefore for `f32`, `elements` would +always be four. At run time `vscale` could be 1, 2, 3, through to 16 which would give register sizes of 128, 256, 384 to 2048. + +The scalable representation accepts the number of `elements` rather than the compiler calculating it, which serves +two purposes. The first being that it removes the need for the compiler to know about the user defined types and how to calculate +the required `element` count. The second being that some of these scalable types can have different element counts. For instance, +the predicates used in SVE have different element counts in LLVM depending on the types they are a predicate for. + +Within Rust some of the requirements on a SIMD type would need to be relaxed when the scalable attribute is applied, for instance, +currently the type can't be a ZST this check would need to be conditioned on the scalable attribute not being present, and a check +to ensure a scalable vector is a ZST should be added. +Additionally scalable vector types shouldn't be allowed to be stored in a structure, as the layout of that structure wouldn't be known. +Aside from that check, all other SIMD checks should be valid to do with what the type can contain. + +This should have minimal impact with other language features, to the same extent that the `repr(simd)` has. + + +As mentioned previously `vscale` is a runtime constant. With SVE the vector length can be changed at runtime (e.g. by a +[prctl()](https://www.kernel.org/doc/Documentation/arm64/sve.txt) call in Linux), but Rust would consider this undefined +behaviour. This is consistent with C and C++ implementations. + +# Drawbacks +[drawbacks]: #drawbacks + +One difficulty with this type of approach is typically vector types require a target feature to be enabled. +Currently, a trait implementation can't enable a target feature, so `Clone` can't be implemented without +setting `-C target-feature` via rustc. + +However, that isn't a reason to not do this, it's a pain point that another RFC can address. + +# Prior art +[prior-art]: #prior-art + +This is a relatively new concept, with not much prior art. C has gone a very similar way to this by using a ZST to +represent the SVE types. Aligning with C here means that most of the documentation that already exists for +the intrinsics in C should still be applicable to Rust. + From b5073c9a9d162aa4e0b2f3be6f972583c419fbca Mon Sep 17 00:00:00 2001 From: Jamie Cunliffe Date: Thu, 19 May 2022 13:27:33 +0100 Subject: [PATCH 2/5] Update PR number --- text/{0000-repr-scalable.md => 3268-repr-scalable.md} | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) rename text/{0000-repr-scalable.md => 3268-repr-scalable.md} (97%) diff --git a/text/0000-repr-scalable.md b/text/3268-repr-scalable.md similarity index 97% rename from text/0000-repr-scalable.md rename to text/3268-repr-scalable.md index 3edbdfd5164..0d0c8d5788d 100644 --- a/text/0000-repr-scalable.md +++ b/text/3268-repr-scalable.md @@ -1,6 +1,6 @@ - Feature Name: repr_scalable -- Start Date: (fill me in with today's date, YYYY-MM-DD) -- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Start Date: 2022-05-19 +- RFC PR: [rust-lang/rfcs#3268](https://github.com/rust-lang/rfcs/pull/3268) - Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) # Summary From c879d455746161a4c1578cae350ae34ea33c3590 Mon Sep 17 00:00:00 2001 From: Jamie Cunliffe Date: Thu, 28 Jul 2022 13:56:37 +0100 Subject: [PATCH 3/5] Updates based on comments --- text/3268-repr-scalable.md | 53 +++++++++++++++++++++++++++++++------- 1 file changed, 44 insertions(+), 9 deletions(-) diff --git a/text/3268-repr-scalable.md b/text/3268-repr-scalable.md index 0d0c8d5788d..bdc1d6d9550 100644 --- a/text/3268-repr-scalable.md +++ b/text/3268-repr-scalable.md @@ -28,12 +28,10 @@ An understanding of that is expected from the reader of this. In addition to tha [Arm SVE](https://developer.arm.com/documentation/102476/latest/) is assumed. Existing SIMD types are tagged with a `repr(simd)` and contain an array or multiple fields to represent the size of the -vector. Given that SVE registers don't have a size known at compile time, they can be -represented as a ZST. Therefore to represent an SVE type, we should add an additional `repr()` to say it is scalable -and have a type marker to specify the element type. - -This RFC is proposing adding an additional representation, `scalable`, that accepts an integer to determine the number of -elements per granule. See the definitions in [the reference-level explanation](#reference-level-explanation) for more information. +vector. Scalable vectors have a size known (and constant) at run-time, but unknown at compile time. For this we propose a +new kind of exotic type, denoted by an additional `repr()`, and based on a ZST. This additional representation, `scalable`, +accepts an integer to determine the number of elements per granule. See the definitions in +[the reference-level explanation](#reference-level-explanation) for more information. e.g. for a scalable vector f32 type the following could be its representation: @@ -47,6 +45,13 @@ pub struct svfloat32_t { `_ty` is purely a type marker, used to get the element type for the LLVM backend. +This new class of type has some restrictions on it that a normal ZST wouldn't have, and some of the restrictions that a ZST +has do not apply to this new type. +As this type does have a run-time size it can be stored to memory, this is required for spilling to the stack for instance. +This new class of type can't be stored in a structure or a compound type, as the layout of that wouldn't be known at compile +time. + + A simple example that an end user would be able to write for summing of two arrays using functions from the ACLE for SVE is shown below: ```rust @@ -96,15 +101,14 @@ the predicates used in SVE have different element counts in LLVM depending on th Within Rust some of the requirements on a SIMD type would need to be relaxed when the scalable attribute is applied, for instance, currently the type can't be a ZST this check would need to be conditioned on the scalable attribute not being present, and a check to ensure a scalable vector is a ZST should be added. -Additionally scalable vector types shouldn't be allowed to be stored in a structure, as the layout of that structure wouldn't be known. Aside from that check, all other SIMD checks should be valid to do with what the type can contain. This should have minimal impact with other language features, to the same extent that the `repr(simd)` has. As mentioned previously `vscale` is a runtime constant. With SVE the vector length can be changed at runtime (e.g. by a -[prctl()](https://www.kernel.org/doc/Documentation/arm64/sve.txt) call in Linux), but Rust would consider this undefined -behaviour. This is consistent with C and C++ implementations. +[prctl()](https://www.kernel.org/doc/Documentation/arm64/sve.txt) call in Linux). However, since this would require a change +to `vscale`, this is considered undefined behaviour in Rust. This is consistent with C and C++ implementations. # Drawbacks [drawbacks]: #drawbacks @@ -122,3 +126,34 @@ This is a relatively new concept, with not much prior art. C has gone a very sim represent the SVE types. Aligning with C here means that most of the documentation that already exists for the intrinsics in C should still be applicable to Rust. +# Future possibilities +[future-possibilities]: #future-possibilities + +## Portable SIMD +For this to work with portable SIMD in the way that portable SIMD is currently implemented, a const generic parameter +would be needed in the `repr(scalable)`. Creating this dependency would be awkward from an implementation point of view +as it would require support for symbols within the literals. + +One potential for having portable SIMD working in its current style would be to have a trait as follows: +```rust +pub trait RuntimeScalable { + type Increment; +} +``` + +Which the compiler can use to get the `elements` and `type` from. + +The above representation could then be implemented as: +```rust +#[repr(simd, scalable)] +#[derive(Clone, Copy)] +pub struct svfloat32_t {} +impl RuntimeScalable for svfloat32_t { + type Increment = [f32; 4]; +} +``` + +Given the differences in how scalable SIMD works with current instruction sets it's worth experimenting with +architecture specific implementations first. Therefore portable scalable SIMD should be fully addressed with +another RFC as there should be questions as to how it's going to work with adjusting the active lanes (e.g. +predication). From 6d1d457a1b95b9a8bc439e5e77c09a46e293bb71 Mon Sep 17 00:00:00 2001 From: Jamie Cunliffe Date: Fri, 6 Oct 2023 16:19:28 +0100 Subject: [PATCH 4/5] Update to include the restrictions on the scalable types. --- text/3268-repr-scalable.md | 92 +++++++++++++++++++++++++------------- 1 file changed, 62 insertions(+), 30 deletions(-) diff --git a/text/3268-repr-scalable.md b/text/3268-repr-scalable.md index bdc1d6d9550..c99f21f4a57 100644 --- a/text/3268-repr-scalable.md +++ b/text/3268-repr-scalable.md @@ -29,7 +29,7 @@ An understanding of that is expected from the reader of this. In addition to tha Existing SIMD types are tagged with a `repr(simd)` and contain an array or multiple fields to represent the size of the vector. Scalable vectors have a size known (and constant) at run-time, but unknown at compile time. For this we propose a -new kind of exotic type, denoted by an additional `repr()`, and based on a ZST. This additional representation, `scalable`, +new kind of exotic type, denoted by an additional `repr()`. This additional representation, `scalable`, accepts an integer to determine the number of elements per granule. See the definitions in [the reference-level explanation](#reference-level-explanation) for more information. @@ -37,23 +37,30 @@ e.g. for a scalable vector f32 type the following could be its representation: ```rust #[repr(simd, scalable(4))] -#[derive(Clone, Copy)] pub struct svfloat32_t { - _ty: [f32; 0], + _ty: [f32], } ``` `_ty` is purely a type marker, used to get the element type for the LLVM backend. -This new class of type has some restrictions on it that a normal ZST wouldn't have, and some of the restrictions that a ZST -has do not apply to this new type. -As this type does have a run-time size it can be stored to memory, this is required for spilling to the stack for instance. -This new class of type can't be stored in a structure or a compound type, as the layout of that wouldn't be known at compile -time. - +This new class of type has the following properties: +* Not `Sized`, but it does exist as a value type. + * These can be returned from functions. +* Heap allocation of these types is not possible. +* Can be passed by value, reference and pointer. +* The types can't have a `'static` lifetime. +* These types can be loaded and stored to/from memory for spilling to the stack, + and to follow any calling conventions. +* Can't be stored in a struct, enum, union or compound type. + * This includes single field structs with `#[repr(trasparent)]`. + * This also means that closures can't capture them by value. +* Traits can be implemented for these types. +* These types are `Unpin`. A simple example that an end user would be able to write for summing of two arrays using functions from the ACLE for SVE is shown below: + ```rust unsafe { let step = svcntw() as usize; @@ -90,49 +97,75 @@ For a Scalable Vector Type LLVM takes the form ``. * `elements` multiplied by sizeof(`type`) gives the smallest allowed register size and the increment size. * `vscale` is a run time constant that is used to determine the actual vector register size. -For example, with Arm SVE the scalable vector register (Z register) size has to be a multiple of 128 bits, therefore for `f32`, `elements` would -always be four. At run time `vscale` could be 1, 2, 3, through to 16 which would give register sizes of 128, 256, 384 to 2048. +For example, with Arm SVE the scalable vector register (Z register) size has to +be a multiple of 128 bits and a power of 2 (via a retrospective change to the +architecture), therefore for `f32`, `elements` would always be four. At run time +`vscale` could be 1, 2, 4, 8, 16 which would give register sizes of 128, 256, +512, 1024 and 2048. While SVE now has the power of 2 restriction, `vscale` could +be any value providing it gives a legal vector register size for the +architecture. The scalable representation accepts the number of `elements` rather than the compiler calculating it, which serves two purposes. The first being that it removes the need for the compiler to know about the user defined types and how to calculate the required `element` count. The second being that some of these scalable types can have different element counts. For instance, the predicates used in SVE have different element counts in LLVM depending on the types they are a predicate for. -Within Rust some of the requirements on a SIMD type would need to be relaxed when the scalable attribute is applied, for instance, -currently the type can't be a ZST this check would need to be conditioned on the scalable attribute not being present, and a check -to ensure a scalable vector is a ZST should be added. -Aside from that check, all other SIMD checks should be valid to do with what the type can contain. - -This should have minimal impact with other language features, to the same extent that the `repr(simd)` has. - - As mentioned previously `vscale` is a runtime constant. With SVE the vector length can be changed at runtime (e.g. by a [prctl()](https://www.kernel.org/doc/Documentation/arm64/sve.txt) call in Linux). However, since this would require a change to `vscale`, this is considered undefined behaviour in Rust. This is consistent with C and C++ implementations. +## Unsized rules +These types aren't `Sized`, but they need to exist in local variables, and we +need to be able to pass them to, and return them from functions. This means +adding an exception to the rules around returning unsized types in Rust. There +are also some traits (`Copy`) that have a bound on being `Sized`. + +We will implement `Copy` for these types within the compiler, without having to +implement the traits when the types are defined. + +This RFC also changes the rules so that function return values can be `Copy` or +`Sized` (or both). Once returning of unsized is allowed this part of the rule +would be superseded by that mechanism. It's worth noting that, if any other +types are created that are `Copy` but not `Sized` this rule would apply to +those. + # Drawbacks [drawbacks]: #drawbacks -One difficulty with this type of approach is typically vector types require a target feature to be enabled. -Currently, a trait implementation can't enable a target feature, so `Clone` can't be implemented without -setting `-C target-feature` via rustc. +## Target Features +One difficulty with this type of approach is typically vector types require a +target feature to be enabled. Currently, a trait implementation can't enable a +target feature, so some traits can't be implemented correctly without setting `-C +target-feature` via rustc. -However, that isn't a reason to not do this, it's a pain point that another RFC can address. +However, that isn't a reason to not do this, it's a pain point that another RFC +can address. # Prior art [prior-art]: #prior-art -This is a relatively new concept, with not much prior art. C has gone a very similar way to this by using a ZST to -represent the SVE types. Aligning with C here means that most of the documentation that already exists for -the intrinsics in C should still be applicable to Rust. +This is a relatively new concept, with not much prior art. C has gone a very +similar way to this by using sizeless incomplete types to represent the SVE +types. Aligning with C here means that most of the documentation that already +exists for the intrinsics in C should still be applicable to Rust. # Future possibilities [future-possibilities]: #future-possibilities +## Relaxing restrictions +Some of the restrictions that have been placed on these types could possibly be +relaxed at a later time. This could be done in a backwards compatible way. For +instance, we could perhaps relax the rules around placing these in other +types. It could be possible to allow a struct to contain these types by value, +with certain rules such as requiring them to be the last element(s) of the +struct. Doing this could then allow closures to capture them by value. + ## Portable SIMD -For this to work with portable SIMD in the way that portable SIMD is currently implemented, a const generic parameter -would be needed in the `repr(scalable)`. Creating this dependency would be awkward from an implementation point of view -as it would require support for symbols within the literals. +For this to work with portable SIMD in the way that portable SIMD is currently +implemented, a const generic parameter would be needed in the +`repr(scalable)`. Creating this dependency would probably be a source of bugs +from an implementation point of view as it would require support for symbols +within the literals. One potential for having portable SIMD working in its current style would be to have a trait as follows: ```rust @@ -146,7 +179,6 @@ Which the compiler can use to get the `elements` and `type` from. The above representation could then be implemented as: ```rust #[repr(simd, scalable)] -#[derive(Clone, Copy)] pub struct svfloat32_t {} impl RuntimeScalable for svfloat32_t { type Increment = [f32; 4]; From d425c4655068173ac98f2ca531123884c370259a Mon Sep 17 00:00:00 2001 From: Jamie Cunliffe Date: Mon, 20 Jan 2025 17:25:10 +0000 Subject: [PATCH 5/5] Restructure to make clearer the user visible parts and internal. Restructured to make it clearer what a user of Rust will be using and what are internal core_arch implementation details. Added details about the ABI. Updated how Copy works for these types. --- text/3268-repr-scalable.md | 276 ++++++++++++++++++++++++------------- 1 file changed, 181 insertions(+), 95 deletions(-) diff --git a/text/3268-repr-scalable.md b/text/3268-repr-scalable.md index c99f21f4a57..0973fdbf7f1 100644 --- a/text/3268-repr-scalable.md +++ b/text/3268-repr-scalable.md @@ -13,56 +13,39 @@ Expanding the SIMD functionality to allow for runtime determined vector lengths. Without some support in the compiler it would be impossible to use the [ACLE](https://developer.arm.com/architectures/system-architectures/software-standards/acle) -[SVE](https://developer.arm.com/documentation/102476/latest/) intrinsics from Arm. +[SVE](https://developer.arm.com/documentation/102476/latest/) intrinsics from +Arm. -This RFC will focus on the Arm vector extensions, and will use them for all examples. A large amount of what this -RFC covers is emitting the vscale attribute from LLVM, therefore other scalable vector extensions should work. -In an LLVM developer meeting it was mentioned that RISC-V would use what's accepted for Arm SVE for their vector extensions. -\[[see slide 17](https://llvm.org/devmtg/2019-04/slides/TechTalk-Kruppe-Espasa-RISC-V_Vectors_and_LLVM.pdf)\] +This RFC will focus on the Arm vector extensions, and will use them for all +examples. A large amount of what this RFC covers is emitting the vscale +attribute from LLVM, therefore other scalable vector extensions should work. In +an LLVM developer meeting it was mentioned that RISC-V would use what's accepted +for Arm SVE for their vector extensions. \[[see slide +17](https://llvm.org/devmtg/2019-04/slides/TechTalk-Kruppe-Espasa-RISC-V_Vectors_and_LLVM.pdf)\] # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -This is mostly an extension to [RFC 1199 SIMD Infrastructure](https://rust-lang.github.io/rfcs/1199-simd-infrastructure.html). -An understanding of that is expected from the reader of this. In addition to that, a basic understanding of -[Arm SVE](https://developer.arm.com/documentation/102476/latest/) is assumed. +This is an extension to [RFC 1199 SIMD +Infrastructure](https://rust-lang.github.io/rfcs/1199-simd-infrastructure.html). +An understanding of that is expected from the reader of this. -Existing SIMD types are tagged with a `repr(simd)` and contain an array or multiple fields to represent the size of the -vector. Scalable vectors have a size known (and constant) at run-time, but unknown at compile time. For this we propose a -new kind of exotic type, denoted by an additional `repr()`. This additional representation, `scalable`, -accepts an integer to determine the number of elements per granule. See the definitions in -[the reference-level explanation](#reference-level-explanation) for more information. +## Brief introduction to scalable vectors -e.g. for a scalable vector f32 type the following could be its representation: - -```rust -#[repr(simd, scalable(4))] -pub struct svfloat32_t { - _ty: [f32], -} -``` -`_ty` is purely a type marker, used to get the element type for the LLVM backend. - - -This new class of type has the following properties: -* Not `Sized`, but it does exist as a value type. - * These can be returned from functions. -* Heap allocation of these types is not possible. -* Can be passed by value, reference and pointer. -* The types can't have a `'static` lifetime. -* These types can be loaded and stored to/from memory for spilling to the stack, - and to follow any calling conventions. -* Can't be stored in a struct, enum, union or compound type. - * This includes single field structs with `#[repr(trasparent)]`. - * This also means that closures can't capture them by value. -* Traits can be implemented for these types. -* These types are `Unpin`. +A scalable vector type maps to a CPU register where the size of the register +isn't known during compile time, but it will be a fixed size during +runtime. Certain properties will be known about this during compile time for +example, an architecture might specify there is a minimum size to this register +and that its size must be a multiple of specific amounts, or the maximum +possible size of the register for instance. -A simple example that an end user would be able to write for summing of two arrays using functions from the ACLE -for SVE is shown below: +## Using SVE intrinsics +A simple example that an end user would be able to write for summing of two +arrays using functions from the ACLE for SVE is shown below: ```rust unsafe { + // svcntw returns the actual number of elements that are in a 32 bit element vector let step = svcntw() as usize; for i in (0..SIZE).step_by(step) { let a = data_a.as_ptr().add(i); @@ -70,84 +53,174 @@ unsafe { let c = &mut data_c as *mut f32; let c = c.add(i); + // svwhilelt_b32 generates a mask based on comparing the current index + // against the SIZE let pred = svwhilelt_b32(i as _, SIZE as _); + + // svld1_f32 loads a vector register with the data from address a, + // zeroing any elements in the vector that are masked out. let sva = svld1_f32(pred, a); let svb = svld1_f32(pred, b); + + // svadd_f32_m adds a and b, any lanes that are masked out will take the + // keep value of a let svc = svadd_f32_m(pred, sva, svb); + // svst1_f32 will store the result without accessing any memory + // locations that are masked out svst1_f32(svc, pred, c); } } ``` -As can be seen by that example the end user wouldn't necessarily interact directly with the changes that are -proposed by this RFC, but might use types and functions that depend on them. +As can be seen in this example, from a user perspective of writing code for +scalable vectors, it's not all that different from when writing code with a +fixed sized vector. Its arguably easier when working with scalable as you don't +have to worry about being a multiple of your fixed vector size. + +## Internal core_arch library details +Existing SIMD types are tagged with a `repr(simd)` and contain an array or +multiple fields to represent the size of the vector. Scalable vectors have a +size known (and constant) at run-time, but unknown at compile time. For this we +propose a new kind of exotic type, denoted by an additional `repr()`. This +additional representation, `scalable`, accepts an integer to determine the +minimum number of elements the vector contains. See the definitions in [the +reference-level explanation](#reference-level-explanation) for more information. + +e.g. for a scalable vector f32 type the following could be its representation: + +```rust +#[repr(simd, scalable(4))] +pub struct svfloat32_t { + _ty: [f32], +} +``` +`_ty` is purely a type marker, used to get the element type for the LLVM backend. + +It's worth noting that currently there are no plans for stabilizing the +`repr_scalable` attribute. This is purely an internal implementation detail, +that could be changed at a later time if required. The only expected use of this +attribute is to be within stdarch. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -This will focus on LLVM. No investigation has been done into the alternative codegen back ends. At the time of -writing I believe cranelift doesn't support scalable vectors ([current proposal](https://github.com/bytecodealliance/rfcs/pull/19)), -and the GCC backend is not mature enough to be thinking about this. +This will focus on LLVM. No investigation has been done into the alternative +codegen back ends. At the time of writing I believe cranelift doesn't support +scalable vectors ([current +proposal](https://github.com/bytecodealliance/rfcs/pull/19)), and the GCC +backend is not mature enough to be thinking about this. -Most of the complexity of SVE will be handled by LLVM and the `vscale` modifier that is applied to vector types. Therefore -changes for this should be fairly minimal for Rust. From the LLVM side this is as simple as calling `LLVMScalableVectorType` -rather than `LLVMVectorType`. +Most of the complexity of SVE will be handled by LLVM and the `vscale` modifier +that is applied to vector types. Therefore changes for this should be fairly +minimal for Rust. From the LLVM side this is as simple as calling +`LLVMScalableVectorType` rather than `LLVMVectorType`. For a Scalable Vector Type LLVM takes the form ``. -* `elements` multiplied by sizeof(`type`) gives the smallest allowed register size and the increment size. -* `vscale` is a run time constant that is used to determine the actual vector register size. +* `elements` multiplied by sizeof(`type`) gives the smallest allowed register + size and the increment size. +* `vscale` is a run time constant that is used to determine the actual vector + register size. For example, with Arm SVE the scalable vector register (Z register) size has to be a multiple of 128 bits and a power of 2 (via a retrospective change to the -architecture), therefore for `f32`, `elements` would always be four. At run time -`vscale` could be 1, 2, 4, 8, 16 which would give register sizes of 128, 256, -512, 1024 and 2048. While SVE now has the power of 2 restriction, `vscale` could -be any value providing it gives a legal vector register size for the -architecture. +architecture), therefore for `f32`, `elements` would always be four, as with the +minimum `vscale` of 1 you would have `1 * 4 * sizeof(f32)` which would give you +the 128 bit minimum register size. At run time `vscale` could then be 1, 2, 4, +8, 16 which would give register sizes of 128, 256, 512, 1024 and 2048. While SVE +now has the power of 2 restriction, `vscale` could be any value providing it +gives a legal vector register size for the architecture. + +The scalable representation accepts the number of `elements` rather than the +compiler calculating it, which serves two purposes. The first being that it +removes the need for the compiler to know about the user defined types and how +to calculate the required `element` count. The second being that some of these +scalable types can have different element counts. For instance, the predicates +used in SVE have different element counts in LLVM depending on the types they +are a predicate for. + +As mentioned previously `vscale` is a runtime constant. With SVE the vector +length can be changed at runtime (e.g. by a +[prctl()](https://www.kernel.org/doc/Documentation/arm64/sve.txt) call in +Linux). However, since this would require a change to `vscale`, this is +considered undefined behaviour in Rust. This is consistent with C and C++ +implementations. + +## Properties of these types +* Not `Sized`, but it does exist as a value type. + * More details will be addressed in the sections below. + * These can be returned from functions. + * They can be passed to a function by value, reference or pointer. + * They can exist as local values on the stack. +* Can be passed by value, reference and pointer. +* The types can't be a static variable. +* These types can be loaded and stored to/from memory for spilling to the stack, + and to follow any calling conventions. +* Can't be stored in a struct, enum, union or compound type. + * This includes single field structs with `#[repr(transparent)]`. + * This also means that closures can't capture them by value. +* Traits can be implemented for these types. -The scalable representation accepts the number of `elements` rather than the compiler calculating it, which serves -two purposes. The first being that it removes the need for the compiler to know about the user defined types and how to calculate -the required `element` count. The second being that some of these scalable types can have different element counts. For instance, -the predicates used in SVE have different element counts in LLVM depending on the types they are a predicate for. -As mentioned previously `vscale` is a runtime constant. With SVE the vector length can be changed at runtime (e.g. by a -[prctl()](https://www.kernel.org/doc/Documentation/arm64/sve.txt) call in Linux). However, since this would require a change -to `vscale`, this is considered undefined behaviour in Rust. This is consistent with C and C++ implementations. +## ABI +With the existing SIMD types in Rust they are currently always passed on the +stack, this was done for ABI reasons as target features at the call site then +don't need to match the function being called. For scalable SIMD the types are +passed in registers. These types cannot exist in a function that +doesn't have the correct target feature for the types. If a call site doesn't +have the feature for the types, we wouldn't even be able to put them onto the +stack, as we wouldn't be able to access the size we need to allocate as the +instruction for that would also be gated under the same target feature. ## Unsized rules These types aren't `Sized`, but they need to exist in local variables, and we -need to be able to pass them to, and return them from functions. This means -adding an exception to the rules around returning unsized types in Rust. There -are also some traits (`Copy`) that have a bound on being `Sized`. - -We will implement `Copy` for these types within the compiler, without having to -implement the traits when the types are defined. - -This RFC also changes the rules so that function return values can be `Copy` or -`Sized` (or both). Once returning of unsized is allowed this part of the rule -would be superseded by that mechanism. It's worth noting that, if any other -types are created that are `Copy` but not `Sized` this rule would apply to -those. - -# Drawbacks -[drawbacks]: #drawbacks - -## Target Features -One difficulty with this type of approach is typically vector types require a -target feature to be enabled. Currently, a trait implementation can't enable a -target feature, so some traits can't be implemented correctly without setting `-C -target-feature` via rustc. - -However, that isn't a reason to not do this, it's a pain point that another RFC -can address. +need to be able to pass them to, and return them from functions. The existing +unsized features (`unsized_locals` and `unsized_fn_params`) can be used for +this. We won't be depending on the features directly but rather using the +existing support within the compiler when the type is a scalable vector type. + +Future RFCs might provide alternatives to those features. For instance, if we +have a [hierarchy of size traits](https://github.com/rust-lang/rfcs/pull/3729) +then we could change the bounds check to use different traits, and avoid the +need for those features. + +As mentioned these types need to be returned from functions to, this RFC also +changes the rules so that functions can return values that can be copied or are +`Sized`. Once returning of unsized is allowed this part of the rule would be +superseded by that mechanism. It's worth noting that, if any other types are +created that can be copied but are not `Sized` this rule would apply to those. + +### Copy +The `Copy` trait has a bound on the type being `Sized`, this is a problem for +these types as they can be copied. + +We will implement the ability for these types to be copied within the compiler, +without having to implement the traits when the types are defined. This is +unsound as it will leave two views of the world to the compiler, any future +checks internally with the `Copy` trait will also need to have this special case +logic. + +Future RFCs (e.g. [sized +hierarchy](https://github.com/rust-lang/rfcs/pull/3729)) can work on solutions +for allowing the bounds of `Copy` to be changed so that these types can +implement the trait to fix this issue up. The `repr(scalable)` attribute has no +plans to be stabilised, and the actual `repr(scalable)` types will not be +stabilised until this issue is addressed. Allowing the bypassing of the `Copy` +trait for now has value to inform the future design of these types. A lot of +restrictions are currently placed on these types, for the type of code that is +expected to be written this might not be too much of an issue, therefore getting +some real world Rust examples will inform the future relaxing of any +restrictions that are placed on these. # Prior art [prior-art]: #prior-art This is a relatively new concept, with not much prior art. C has gone a very similar way to this by using sizeless incomplete types to represent the SVE -types. Aligning with C here means that most of the documentation that already -exists for the intrinsics in C should still be applicable to Rust. +types. Although using those types in C means that you have to apply a edit to +the C and C++ standards. Aligning with C here means that most of the +documentation that already exists for the intrinsics in C should still be +applicable to Rust. In the future we can move away from this close alignment +with C and start to allow uses in more places. # Future possibilities [future-possibilities]: #future-possibilities @@ -156,9 +229,20 @@ exists for the intrinsics in C should still be applicable to Rust. Some of the restrictions that have been placed on these types could possibly be relaxed at a later time. This could be done in a backwards compatible way. For instance, we could perhaps relax the rules around placing these in other -types. It could be possible to allow a struct to contain these types by value, -with certain rules such as requiring them to be the last element(s) of the -struct. Doing this could then allow closures to capture them by value. +types. It could be possible to allow a struct to contain these types by value. +Doing this could then allow closures to capture them by value. It's possible +that they could exist in arrays as it would just be a runtime multiplication to +get the offset. + +This is left for the future as it complicates the stack a lot. LLVM IR mostly +follows C therefore the support for these types in LLVM IR closely matches C. If +we was to attempt to support doing that it would be a lot of codegen work in +rustc to do all the required calculations which then might not lead optimal +code. + +As C doesn't have these features, and code is used in production for SVE, I'm +not aware of much demand for those rules to be changed. So for that reason we +should postpone complicating this feature until real world use cases come up. ## Portable SIMD For this to work with portable SIMD in the way that portable SIMD is currently @@ -167,7 +251,8 @@ implemented, a const generic parameter would be needed in the from an implementation point of view as it would require support for symbols within the literals. -One potential for having portable SIMD working in its current style would be to have a trait as follows: +One potential for having portable SIMD working in its current style would be to +have a trait as follows: ```rust pub trait RuntimeScalable { type Increment; @@ -185,7 +270,8 @@ impl RuntimeScalable for svfloat32_t { } ``` -Given the differences in how scalable SIMD works with current instruction sets it's worth experimenting with -architecture specific implementations first. Therefore portable scalable SIMD should be fully addressed with -another RFC as there should be questions as to how it's going to work with adjusting the active lanes (e.g. -predication). +Given the differences in how scalable SIMD works with current instruction sets +it's worth experimenting with architecture specific implementations +first. Therefore portable scalable SIMD should be fully addressed with another +RFC as there should be questions as to how it's going to work with adjusting the +active lanes (e.g. predication).