Description
There is a common instruction that performs what we refer to as a "swizzle" (or a variable, runtime-determined lookup-table indexing into another vector, also known as a "shuffle"), available on almost all the architectures we support. However, there is no way to express this portably in LLVMIR.
Nonetheless, the logic for lowering this to target-specific instructions should already be upstream in LLVM in the form of the lowering for the wasm "dynamic swizzling". As we would like to use it in our API directly, it should be altered to become sufficiently generic and available for all platforms, as functionally all platforms (including x86, when you consider sse3 and pshufb, so e.g. x86 Macs have it inhere in the target, as would e.g. an x86-64-v3 target) have a reasonable equivalent. Unfortunately working in C++ is challenging to begin with, and LLVM's dialect is even more arcane.
But, we can also potentially introduce this before any movement is seen in LLVM on our own side, via choosing our own lowerings for LLVMIR, using target-specific intrinsics or a generic scalar LUT pattern. This is the worst answer for x86 compilation, however, and ideally we would just use the LLVMIR intrinsic. But at least Cranelift should find adding this logic easy (as it is tilted towards serving wasm JIT compilation, and this IS a wasm instruction).
There was a relevant Zulip conversation here.
LLVM-side
- Propose a new LLVMIR intrinsic that generalizes the wasm swizzle-lowering mechanism
- PR it and get it merged with at least a generic desugaring
- From there, reexport the lowering for
llvm.wasm.swizzle
to that intrinsic on x86-64... - ...AArch64...
- ...PowerPC...
- ...anything else.
- ...Make sure it can lower back to
llvm.wasm.swizzle
for wasm targets.
Rust-side
- Introduce platform intrinsic into backend + a generic LLVMIR lowering
- Pipe it through portable-simd and thus
core::simd
- Introduce target-specific optimizations for AArch64 (optional)
- Introduce target-specific optimizations for x86-64 + SSE3 (optional)
- Introduce target-specific optimizations for x86-64 + AVX(2?) (optional)
- Introduce target-specific optimizations for PowerPC (optional)
- Introduce target-specific optimizations for wasm (optional)