Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement common swizzle operations. #335

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
127 changes: 127 additions & 0 deletions crates/core_simd/src/swizzle.rs
Original file line number Diff line number Diff line change
Expand Up @@ -364,4 +364,131 @@ where

(Even::swizzle2(self, other), Odd::swizzle2(self, other))
}

/// Takes a slice of a vector to produce a shorter vector.
///
/// This is equivalent to computing `&self[OFFSET..OFFSET+LEN]` on
/// the underlying array.
///
/// ```
/// # #![feature(portable_simd)]
/// # use core::simd::Simd;
/// let x = Simd::from_array([0, 1, 2, 3, 4, 5, 6, 7]);
/// let y = x.slice::<2, 4>();
/// assert_eq!(y.to_array(), [2, 3, 4, 5]);
/// ```
///
/// Will be rejected at compile time if `OFFSET + LEN > LANES`.
#[inline]
#[must_use = "method returns a new vector and does not mutate the original inputs"]
pub fn slice<const OFFSET: usize, const LEN: usize>(self) -> Simd<T, LEN>
where
LaneCount<LEN>: SupportedLaneCount,
{
const fn slice_index<const LEN: usize>(offset: usize, lanes: usize) -> [usize; LEN] {
assert!(offset + LEN <= lanes, "slice out of bounds");
let mut index = [0; LEN];
let mut i = 0;
while i < LEN {
index[i] = i + offset;
i += 1;
}
index
}
struct Slice<const OFFSET: usize>;
impl<const OFFSET: usize, const LEN: usize, const LANES: usize> Swizzle<LANES, LEN>
for Slice<OFFSET>
{
const INDEX: [usize; LEN] = slice_index::<LEN>(OFFSET, LANES);
}
Slice::<OFFSET>::swizzle(self)
}

/// Concatenates two vectors of equal length.
///
/// Due to limitations in const generics, the length of the resulting vector cannot be inferred
/// from the input vectors. You must specify it explicitly.
///
/// ```
/// # #![feature(portable_simd)]
/// # use core::simd::Simd;
/// let x = Simd::from_array([0, 1, 2, 3]);
/// let y = Simd::from_array([4, 5, 6, 7]);
/// let z = x.concat_to::<8>(y);
/// assert_eq!(z.to_array(), [0, 1, 2, 3, 4, 5, 6, 7]);
/// ```
///
/// Will be rejected at compile time if `LANES * 2 != DOUBLE_LANES`.
#[inline]
#[must_use = "method returns a new vector and does not mutate the original inputs"]
pub fn concat_to<const DOUBLE_LANES: usize>(self, other: Self) -> Simd<T, DOUBLE_LANES>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concat_to? Why not just concat?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose concat_to because I imagine we'll want to eventually (when generic_const_exprs stabilizes) deprecate this function in favor of a function that uses generic_const_exprs. I wanted to reserve the better name concat for that new function, which would have signature:

fn concat(self, other: Self) -> Simd<T, {LANES * 2}>

The idea of the concat_to naming is that most call sites will look like x.concat_to::<8>(y) or similar, so it can be read as "concatenate x to length 8 using y".

Open to other suggestions. One option would be to call it concat now, and don't worry about introducing a breaking change in future once the generic_const_exprs approach works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hm.

where
LaneCount<DOUBLE_LANES>: SupportedLaneCount,
{
const fn concat_index<const DOUBLE_LANES: usize>(lanes: usize) -> [Which; DOUBLE_LANES] {
assert!(lanes * 2 == DOUBLE_LANES);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking: Are you aware that when you put something in a const fn, and then the const fn is evaluated at runtime, that this means an assert! in it will not be evaluated at compilation time, but rather will be evaluated at runtime?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's only used at compile time, so that doesn't matter here.

Copy link
Member

@workingjubilee workingjubilee Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, I see now, it happens in the associated constant. Then that requires monomorphization which puts it at risk of depending on trait and const evaluation details. Hmm. I do want to allow us to use certain post-monomorphization errors but I currently don't fully understand the evaluation patterns that rustc will do to intuit in what cases this will/will not happen.

There are cases where, due to various inputs you can give to the compiler, "dead" code can potentially get resurrected and monomorphized anyways (starting, of course, with -Clink-dead-code), so I want to better know whether this will trigger on those cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guarantee: if you don't instantiate this generic at an invalid input/output type pair, then you won't get a post-monomorphization error. So this is like a type error, except with worse ergonomics because it's raised during monomorphization rather than type checking.

If we had generic_const_exprs we'd make this a genuine type error rather than monomorphization-time error, by giving this function the type concat_to(self, other: Self) -> Simd<T, {LANES * 2}>. I've created commit reinerp@1c2972b on a separate branch, to show how this would work. Unfortunately, the test doesn't compile due to what seems like a bug in generic_const_exprs.

More broadly, I would like to be able to use concat and split operations in my own code without requiring the generic_const_exprs language feature. Hence why I took my current approach for avoiding generic_const_exprs. From my perspective the generic_const_exprs feature is much less stable than portable_simd, and indeed portable_simd is the only unstable feature I allow in my own code.

let mut index = [Which::First(0); DOUBLE_LANES];
let mut i = 0;
while i < lanes {
index[i] = Which::First(i);
index[i + lanes] = Which::Second(i);
i += 1;
}
index
}
struct Concat;
impl<const LANES: usize, const DOUBLE_LANES: usize> Swizzle2<LANES, DOUBLE_LANES> for Concat {
const INDEX: [Which; DOUBLE_LANES] = concat_index::<DOUBLE_LANES>(LANES);
}
Concat::swizzle2(self, other)
}

/// For each lane `i`, swaps it with lane `i ^ SWAP_MASK`.
///
/// Also known as `grev` in the RISC-V Bitmanip specification, this is a powerful
/// swizzle operation that can implement many common patterns as special cases.
///
/// ```
/// # #![feature(portable_simd)]
/// # use core::simd::Simd;
/// let x = Simd::from_array([0, 1, 2, 3, 4, 5, 6, 7]);
/// // Swap adjacent lanes:
/// assert_eq!(x.general_reverse::<1>().to_array(), [1, 0, 3, 2, 5, 4, 7, 6]);
/// // Swap lanes separated by distance 2:
/// assert_eq!(x.general_reverse::<2>().to_array(), [2, 3, 0, 1, 6, 7, 4, 5]);
/// // Swap lanes separated by distance 4:
/// assert_eq!(x.general_reverse::<4>().to_array(), [4, 5, 6, 7, 0, 1, 2, 3]);
/// // Reverse lanes, within each 4-lane group:
/// assert_eq!(x.general_reverse::<3>().to_array(), [3, 2, 1, 0, 7, 6, 5, 4]);
/// ```
///
/// Commonly useful for horizontal reductions, for example:
///
/// ```
/// # #![feature(portable_simd)]
/// # use core::simd::Simd;
/// let x = Simd::from_array([0u32, 1, 2, 3, 4, 5, 6, 7]);
/// let x = x + x.general_reverse::<1>();
/// let x = x + x.general_reverse::<2>();
/// let x = x + x.general_reverse::<4>();
/// assert_eq!(x.to_array(), [28, 28, 28, 28, 28, 28, 28, 28]);
/// ```
#[inline]
#[must_use = "method returns a new vector and does not mutate the original inputs"]
pub fn general_reverse<const SWAP_MASK: usize>(self) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho this should include something like lanewise in it's name, since I expect Rust to gain a bitwise grev at some point and we'd want the bitwise simd and scalar integer operations to have matching names (probably gen_rev or generalized_reverse or grev?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For something that effectively implements a very specific xor-striding pattern, general_reverse is a non-descript name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, grev is what that particular bitwise op has become named, thanks to RISC-V afaict.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grev -- general bit reverse

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because an ISA has picked a terrible name doesn't mean we need to copy it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the precedent of u32::count_ones, it looks like the Rust standard library takes the convention of providing a more meaningful name (count_ones) while maintaining searchability for the common name popcnt using #[doc(alias = "popcnt")], https://github.com/rust-lang/rust/blob/f1b1ed7e18f1fbe5226a96626827c625985f8285/library/core/src/num/int_macros.rs#L104. I think such an approach could be warranted here too, keeping grev as an alias. Indeed, grev is a much less established term than popcnt, so the precedent from grev is weaker.

Other names I considered:

  • butterfly_shuffle. The idea here is that, in the common case of SWAP_MASK being a power of 2, it implements one stage of a butterfly network. https://en.wikipedia.org/wiki/Butterfly_network
  • swap_lanes_xor. The "swap lanes" part is pretty self-explanatory. The "xor" part is confusing, however: it suggests that the data bits are being xored, whereas it's actually the lane indices that are being xored.

Currently I lean towards swap_lanes_xor. Open to other suggestions!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swizzle_to_xor_indexes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverse_pow2_lane_groups? Conceptually this operation performs the reversal of blocks of k lanes within n-lane groups, where k and n are both powers of two, k ≤ n ≤ LANES, and the choice of k and n is determined uniquely up to operation uniqueness by choosing where index 0 will be swizzled to (it’ll be exactly SWAP_MASK).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverse_pow2_lane_groups?

afaict grev is actually more powerful than that, it can do any arbitrary combination of those k-n reversals for arbitrary k and n.

e.g. grev(v, 5) is equivalent to simd_swizzle!(v, [5, 4, 7, 6, 1, 0, 3, 2]) which is a combination of reversing adjacent pairs (blocks of 1) and swapping blocks of 4.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks all for the suggestions. Of the proposals so far, my preference order is:

  1. swizzle_to_xor_indices
  2. butterfly_swizzle
  3. grev

I have gone with swizzle_to_xor_indices. Let me know what you think.

const fn general_reverse_index<const LANES: usize>(swap_mask: usize) -> [usize; LANES] {
let mut index = [0; LANES];
let mut i = 0;
while i < LANES {
index[i] = i ^ swap_mask;
i += 1;
}
index
}
struct GeneralReverse<const DISTANCE: usize>;
impl<const LANES: usize, const DISTANCE: usize> Swizzle<LANES, LANES> for GeneralReverse<DISTANCE> {
const INDEX: [usize; LANES] = general_reverse_index::<LANES>(DISTANCE);
}
GeneralReverse::<SWAP_MASK>::swizzle(self)
}
}
57 changes: 57 additions & 0 deletions crates/core_simd/tests/swizzle.rs
Original file line number Diff line number Diff line change
Expand Up @@ -74,3 +74,60 @@ fn interleave_one() {
assert_eq!(even, a);
assert_eq!(odd, b);
}

#[test]
#[cfg_attr(target_arch = "wasm32", wasm_bindgen_test)]
fn slice() {
let a = Simd::from_array([0, 1, 2, 3, 4, 5, 6, 7]);
let lo = a.slice::<0, 4>();
let mid = a.slice::<3, 2>();
let hi = a.slice::<4, 4>();
assert_eq!(lo.to_array(), [0, 1, 2, 3]);
assert_eq!(mid.to_array(), [3, 4]);
assert_eq!(hi.to_array(), [4, 5, 6, 7]);
}

#[test]
#[cfg_attr(target_arch = "wasm32", wasm_bindgen_test)]
fn concat() {
let x = Simd::from_array([0, 1, 2, 3]);
let y = Simd::from_array([4, 5, 6, 7]);
let z = x.concat_to::<8>(y);
assert_eq!(z.to_array(), [0, 1, 2, 3, 4, 5, 6, 7]);
}

#[test]
#[cfg_attr(target_arch = "wasm32", wasm_bindgen_test)]
fn general_reverse() {
let x = Simd::from_array([0, 1, 2, 3, 4, 5, 6, 7]);
// Swap adjacent lanes:
assert_eq!(
x.general_reverse::<1>().to_array(),
[1, 0, 3, 2, 5, 4, 7, 6]
);
// Swap lanes separated by distance 2:
assert_eq!(
x.general_reverse::<2>().to_array(),
[2, 3, 0, 1, 6, 7, 4, 5]
);
// Swap lanes separated by distance 4:
assert_eq!(
x.general_reverse::<4>().to_array(),
[4, 5, 6, 7, 0, 1, 2, 3]
);
// Reverse lanes, within each 4-lane group:
assert_eq!(
x.general_reverse::<3>().to_array(),
[3, 2, 1, 0, 7, 6, 5, 4]
);
}

#[test]
#[cfg_attr(target_arch = "wasm32", wasm_bindgen_test)]
fn general_reverse_sum() {
let x = Simd::from_array([0u32, 1, 2, 3, 4, 5, 6, 7]);
let x = x + x.general_reverse::<1>();
let x = x + x.general_reverse::<2>();
let x = x + x.general_reverse::<4>();
assert_eq!(x.to_array(), [28, 28, 28, 28, 28, 28, 28, 28]);
}