diff --git a/.gitignore b/.gitignore
index 39ad701a8883f..82d2291fd22b0 100644
--- a/.gitignore
+++ b/.gitignore
@@ -56,3 +56,4 @@ goto-transcoder
 # already existing elements were commented out
 
 #/target
+testable-simd-models/target
diff --git a/testable-simd-models/Cargo.toml b/testable-simd-models/Cargo.toml
new file mode 100644
index 0000000000000..6e2116fec82e0
--- /dev/null
+++ b/testable-simd-models/Cargo.toml
@@ -0,0 +1,16 @@
+[package]
+name = "testable-simd-models"
+version = "0.0.2"
+authors = ["Cryspen"]
+license = "Apache-2.0"
+homepage = "https://github.com/cryspen/verify-rust-std/testable-simd-models"
+edition = "2021"
+repository = "https://github.com/cryspen/verify-rust-std/testable-simd-models"
+readme = "README.md"
+
+[dependencies]
+rand = "0.9"
+pastey = "0.1.0"
+
+[lints.rust]
+unexpected_cfgs = { level = "warn" }
diff --git a/testable-simd-models/README.md b/testable-simd-models/README.md
new file mode 100644
index 0000000000000..d051de6145f4a
--- /dev/null
+++ b/testable-simd-models/README.md
@@ -0,0 +1,127 @@
+# testable-simd-models
+
+This crate contains executable, independently testable specifications
+for the SIMD intrinsics provided by the `core::arch` library in Rust. 
+The structure of this crate is based on [rust-lang/stdarch/crates/core_arch](https://github.com/rust-lang/stdarch/tree/master/crates/core_arch).
+
+## Code Structure
+Within the `core_arch` folder in this crate, there is a different
+folder for each architecture for which we have wrtten models. 
+In particular, it contains folders for `x86` and `arm_shared`.
+Each such folder has 3 sub-folders, `models`, `tests`, and `specs`. 
+
+The `models` folder contains the models of the intrinsics, with a file
+corresponding to different target features, and are written using the
+various abstractions implementedin `crate::abstractions`, especially
+those in `crate::abstractions::simd`. These models are meant to
+closely resemble their implementations within the Rust core itself.
+
+The `tests` folder contains the tests of these models, and is
+structured the same way as `models`. Each file additionally contains
+the definition of a macro that makes writing these tests easier. The
+tests work by testing the models against the intrinsics in the Rust
+core, trying out random inputs (generally 1000), and comparing their
+outputs.
+
+## Modeling Process
+The process of adding a specific intrinsic's model goes as follows.
+For this example, let us say the intrinsic we are adding is
+`_mm256_bsrli_epi128` from the avx2 feature set.
+
+1. We go to [rust-lang/stdarch/crates/core_arch/src/x86/](https://github.com/rust-lang/stdarch/tree/master/crates/core_arch/src/x86/), and find the implementation of the intrinsic in `avx2.rs`.
+
+2. We see that the implementation looks like this:
+``` rust
+/// Shifts 128-bit lanes in `a` right by `imm8` bytes while shifting in zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_bsrli_epi128)
+#[inline]
+#[target_feature(enable = "avx2")]
+#[cfg_attr(test, assert_instr(vpsrldq, IMM8 = 1))]
+#[rustc_legacy_const_generics(1)]
+#[stable(feature = "simd_x86", since = "1.27.0")]
+pub fn _mm256_bsrli_epi128<const IMM8: i32>(a: __m256i) -> __m256i {
+    static_assert_uimm_bits!(IMM8, 8);
+    const fn mask(shift: i32, i: u32) -> u32 {
+        let shift = shift as u32 & 0xff;
+        if shift > 15 || (15 - (i % 16)) < shift {
+            0
+        } else {
+            32 + (i + shift)
+        }
+    }
+    unsafe {
+        let a = a.as_i8x32();
+        let r: i8x32 = simd_shuffle!(
+            i8x32::ZERO,
+            a,
+            [
+                mask(IMM8, 0),
+                mask(IMM8, 1),
+                mask(IMM8, 2),
+                mask(IMM8, 3),
+		...
+                mask(IMM8, 31),
+            ],
+        );
+        transmute(r)
+    }
+}
+  ```
+Thus, we then go to to `core_arch/x86/models/avx2.rs`, and add the implementation. After some modification, it ends up looking like this.
+``` rust
+/// Shifts 128-bit lanes in `a` right by `imm8` bytes while shifting in zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_bsrli_epi128)
+
+pub fn _mm256_bsrli_epi128<const IMM8: i32>(a: __m256i) -> __m256i {
+    const fn mask(shift: i32, i: u32) -> u64 {
+        let shift = shift as u32 & 0xff;
+        if shift > 15 || (15 - (i % 16)) < shift {
+            0 as u64
+        } else {
+            (32 + (i + shift)) as u64
+        }
+    }
+    
+	let a = BitVec::to_i8x32(a);
+	let r: i8x32 = simd_shuffle(
+		i8x32::from_fn(|_| 0),
+		a,
+		[
+			mask(IMM8, 0),
+			mask(IMM8, 1),
+			mask(IMM8, 2),
+			mask(IMM8, 3),
+			...
+			mask(IMM8, 31),
+		],
+	);
+	r.into()
+}
+  ```
+  
+3. Next, we add a test for this intrinsic. For this, we navigate to `core_arch/avx2/tests/avx2.rs`. Since the value of
+   `IMM8` can be up to 8 bits, we want to test constant arguments up to 255. Thus, we write the following macro invocation.
+   ```rust
+	   mk!([100]_mm256_bsrli_epi128{<0>,<1>,<2>,<3>,...,<255>}(a: BitVec));
+   ```
+   Here, the `[100]` means we test 100 random inputs for each constant value. This concludes the necessary steps for implementing an intrinsic.
+
+
+## Contributing Models
+
+To contribute new models of intrinsics, we expect the author to follow
+the above steps and provide comprehensive tests.  It is important that
+the model author look carefully at both the Intel/ARM specification
+and the Rust `stdarch` implementation, because the Rust implementation
+may not necessarily be correct.
+
+Indeed, the previous implementation of `_mm256_bsrli_epi128` (and a
+similar intrinsic called `_mm512_bsrli_epi128`) in `stdarch` had a
+bug, which we found during the process of modeling and testing this
+intrinsic. This bug was [reported by
+us](https://github.com/rust-lang/stdarch/issues/1822) using a failing
+test case generated from the testable model and then fixed by [our
+PR](https://github.com/rust-lang/stdarch/pull/1823) in the 2025-06-30
+version of `stdarch`.
diff --git a/testable-simd-models/src/abstractions/bit.rs b/testable-simd-models/src/abstractions/bit.rs
new file mode 100644
index 0000000000000..4fac19fdcd567
--- /dev/null
+++ b/testable-simd-models/src/abstractions/bit.rs
@@ -0,0 +1,204 @@
+//! # Bit Manipulation and Machine Integer Utilities
+//!
+//! This module provides utilities for working with individual bits and machine integer types.
+//! It defines a [`Bit`] enum to represent a single bit (`0` or `1`) along with convenient
+//! conversion implementations between `Bit`, [`bool`], and various primitive integer types.
+//!
+//! In addition, the module introduces the [`MachineInteger`] trait which abstracts over
+//! integer types, providing associated constants:
+//!
+//! - `BITS`: The size of the integer type in bits.
+//! - `SIGNED`: A flag indicating whether the type is signed.
+//!
+//! The [`Bit`] type includes methods for extracting the value of a specific bit from an integer.
+//! For example, [`Bit::of_int`] returns the bit at a given position for a provided integer,
+//! handling both positive and negative values (assuming a two's complement representation).
+//!
+//! # Examples
+//!
+//! ```rust
+//! use testable_simd_models::abstractions::bit::{Bit, MachineInteger};
+//!
+//! // Extract the 3rd bit (0-indexed) from an integer.
+//! let bit = Bit::of_int(42, 2);
+//! println!("The extracted bit is: {:?}", bit);
+//!
+//! // Convert Bit to a primitive integer type.
+//! let num: u8 = bit.into();
+//! println!("As an integer: {}", num);
+//! ```
+//!
+//! [`bool`]: https://doc.rust-lang.org/std/primitive.bool.html
+//! [`Bit::of_int`]: enum.Bit.html#method.of_int
+
+/// Represent a bit: `0` or `1`.
+#[derive(Copy, Clone, Eq, PartialEq, Debug)]
+pub enum Bit {
+    Zero,
+    One,
+}
+impl std::ops::BitAnd for Bit {
+    type Output = Self;
+    fn bitand(self, rhs: Self) -> Self {
+        match self {
+            Bit::Zero => Bit::Zero,
+            Bit::One => rhs,
+        }
+    }
+}
+
+impl std::ops::BitOr for Bit {
+    type Output = Self;
+    fn bitor(self, rhs: Self) -> Self {
+        match self {
+            Bit::Zero => rhs,
+            Bit::One => Bit::One,
+        }
+    }
+}
+
+impl std::ops::BitXor for Bit {
+    type Output = Self;
+    fn bitxor(self, rhs: Self) -> Self {
+        match (self, rhs) {
+            (Bit::Zero, Bit::Zero) => Bit::Zero,
+            (Bit::One, Bit::One) => Bit::Zero,
+            _ => Bit::One,
+        }
+    }
+}
+
+impl std::ops::Neg for Bit {
+    type Output = Self;
+    fn neg(self) -> Self {
+        match self {
+            Bit::One => Bit::Zero,
+            Bit::Zero => Bit::One,
+        }
+    }
+}
+macro_rules! generate_from_bit_impls {
+    ($($ty:ident),*) => {
+        $(impl From<Bit> for $ty {
+            fn from(bit: Bit) -> Self {
+                bool::from(bit) as $ty
+            }
+        })*
+    };
+}
+generate_from_bit_impls!(u8, u16, u32, u64, u128, i8, i16, i32, i64, i128);
+
+impl From<Bit> for bool {
+    fn from(bit: Bit) -> Self {
+        match bit {
+            Bit::Zero => false,
+            Bit::One => true,
+        }
+    }
+}
+
+impl From<bool> for Bit {
+    fn from(b: bool) -> Bit {
+        match b {
+            false => Bit::Zero,
+            true => Bit::One,
+        }
+    }
+}
+
+/// A trait for types that represent machine integers.
+pub trait MachineInteger {
+    /// The size of this integer type in bits.
+    fn bits() -> u32;
+
+    /// The signedness of this integer type.
+    const SIGNED: bool;
+    /// Element of the integer type with every bit as 0.
+    const ZEROS: Self;
+    /// Element of the integer type with every bit as 1.
+    const ONES: Self;
+    /// Minimum value of the integer type.
+    const MIN: Self;
+    /// Maximum value of the integer type.
+    const MAX: Self;
+
+    /// Implements functionality for `simd_add` in `crate::abstractions::simd`.
+    fn wrapping_add(self, rhs: Self) -> Self;
+    /// Implements functionality for `simd_sub` in `crate::abstractions::simd`.
+    fn wrapping_sub(self, rhs: Self) -> Self;
+    /// Implements functionality for `simd_mul` in `crate::abstractions::simd`.
+    fn overflowing_mul(self, rhs: Self) -> Self;
+    /// Implements functionality for `simd_saturating_add` in `crate::abstractions::simd`.
+    fn saturating_add(self, rhs: Self) -> Self;
+    /// Implements functionality for `simd_saturating_sub` in `crate::abstractions::simd`.
+    fn saturating_sub(self, rhs: Self) -> Self;
+    /// Implements functionality for `simd_abs_diff` in `crate::abstractions::simd`.
+    fn absolute_diff(self, rhs: Self) -> Self;
+    /// Implements functionality for `simd_abs` in `crate::abstractions::simd`.
+    fn absolute_val(self) -> Self;
+}
+
+macro_rules! generate_imachine_integer_impls {
+    ($($ty:ident),*) => {
+        $(
+	    impl MachineInteger for $ty {
+		const SIGNED: bool = true;
+		const ZEROS: $ty = 0;
+		const ONES: $ty = -1;
+		const MIN: $ty = $ty::MIN;
+		const MAX: $ty = $ty::MAX;
+		fn bits() -> u32 { $ty::BITS }
+		fn wrapping_add(self, rhs: Self) -> Self { self.wrapping_add(rhs) }
+		fn wrapping_sub(self, rhs: Self) -> Self { self.wrapping_sub(rhs) }
+		fn overflowing_mul(self, rhs: Self) -> Self { self.overflowing_mul(rhs).0 }
+		fn saturating_add(self, rhs: Self) -> Self { self.saturating_add(rhs)}
+		fn saturating_sub(self, rhs: Self) -> Self { self.saturating_sub(rhs) }
+		fn absolute_diff(self, rhs: Self) -> Self {if self > rhs {$ty::wrapping_sub(self, rhs)} else {$ty::wrapping_sub(rhs, self)}}
+		fn absolute_val(self) -> Self {if self == $ty::MIN {self} else {self.abs()}}
+            })*
+    };
+}
+
+macro_rules! generate_umachine_integer_impls {
+    ($($ty:ident),*) => {
+        $(
+	    impl MachineInteger for $ty {
+		const SIGNED: bool = false;
+		const ZEROS: $ty = 0;
+		const ONES: $ty = $ty::MAX;
+		const MIN: $ty = $ty::MIN;
+		const MAX: $ty = $ty::MAX;
+
+
+		fn bits() -> u32 { $ty::BITS }
+		fn wrapping_add(self, rhs: Self) -> Self { self.wrapping_add(rhs) }
+		fn wrapping_sub(self, rhs: Self) -> Self { self.wrapping_sub(rhs) }
+		fn overflowing_mul(self, rhs: Self) -> Self { self.overflowing_mul(rhs).0 }
+		fn saturating_add(self, rhs: Self) -> Self { self.saturating_add(rhs)}
+		fn saturating_sub(self, rhs: Self) -> Self { self.saturating_sub(rhs)}
+		fn absolute_diff(self, rhs: Self) -> Self {if self > rhs {self - rhs} else {rhs - self}}
+		fn absolute_val(self) -> Self {self}
+        })*
+    };
+}
+generate_imachine_integer_impls!(i8, i16, i32, i64, i128);
+generate_umachine_integer_impls!(u8, u16, u32, u64, u128);
+
+impl Bit {
+    fn of_raw_int(x: u128, nth: u32) -> Self {
+        if x / 2u128.pow(nth) % 2 == 1 {
+            Self::One
+        } else {
+            Self::Zero
+        }
+    }
+
+    pub fn of_int<T: Into<i128> + MachineInteger>(x: T, nth: u32) -> Bit {
+        let x: i128 = x.into();
+        if x >= 0 {
+            Self::of_raw_int(x as u128, nth)
+        } else {
+            Self::of_raw_int((2i128.pow(T::bits()) + x) as u128, nth)
+        }
+    }
+}
diff --git a/testable-simd-models/src/abstractions/bitvec.rs b/testable-simd-models/src/abstractions/bitvec.rs
new file mode 100644
index 0000000000000..0f3003f4beadc
--- /dev/null
+++ b/testable-simd-models/src/abstractions/bitvec.rs
@@ -0,0 +1,155 @@
+//! This module provides a specification-friendly bit vector type.
+use super::bit::{Bit, MachineInteger};
+use super::funarr::*;
+
+use std::fmt::Formatter;
+
+/// A fixed-size bit vector type.
+///
+/// `BitVec<N>` is a specification-friendly, fixed-length bit vector that internally
+/// stores an array of [`Bit`] values, where each `Bit` represents a single binary digit (0 or 1).
+///
+/// This type provides several utility methods for constructing and converting bit vectors:
+///
+/// The [`Debug`] implementation for `BitVec` pretty-prints the bits in groups of eight,
+/// making the bit pattern more human-readable. The type also implements indexing,
+/// allowing for easy access to individual bits.
+#[derive(Copy, Clone, Eq, PartialEq)]
+pub struct BitVec<const N: u64>(FunArray<N, Bit>);
+
+/// Pretty prints a bit slice by group of 8
+fn bit_slice_to_string(bits: &[Bit]) -> String {
+    bits.iter()
+        .map(|bit| match bit {
+            Bit::Zero => '0',
+            Bit::One => '1',
+        })
+        .collect::<Vec<_>>()
+        .chunks(8)
+        .map(|bits| bits.iter().collect::<String>())
+        .map(|s| format!("{s} "))
+        .collect::<String>()
+        .trim()
+        .into()
+}
+
+impl<const N: u64> core::fmt::Debug for BitVec<N> {
+    fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), std::fmt::Error> {
+        write!(f, "{}", bit_slice_to_string(&self.0.as_vec()))
+    }
+}
+
+impl<const N: u64> core::ops::Index<u64> for BitVec<N> {
+    type Output = Bit;
+    fn index(&self, index: u64) -> &Self::Output {
+        self.0.get(index)
+    }
+}
+
+/// Convert a bit slice into an unsigned number.
+
+fn u128_int_from_bit_slice(bits: &[Bit]) -> u128 {
+    bits.iter()
+        .enumerate()
+        .map(|(i, bit)| u128::from(*bit) << i)
+        .sum::<u128>()
+}
+
+/// Convert a bit slice into a machine integer of type `T`.
+fn int_from_bit_slice<T: TryFrom<i128> + MachineInteger + Copy>(bits: &[Bit]) -> T {
+    debug_assert!(bits.len() <= T::bits() as usize);
+    let result = if T::SIGNED {
+        let is_negative = matches!(bits[T::bits() as usize - 1], Bit::One);
+        let s = u128_int_from_bit_slice(&bits[0..T::bits() as usize - 1]) as i128;
+        if is_negative {
+            s + (-2i128).pow(T::bits() - 1)
+        } else {
+            s
+        }
+    } else {
+        u128_int_from_bit_slice(bits) as i128
+    };
+    let Ok(n) = result.try_into() else {
+        // Conversion must succeed as `result` is guaranteed to be in range due to the bit-length check.
+        unreachable!()
+    };
+    n
+}
+impl<const N: u64> BitVec<N> {
+    /// Constructor for BitVec. `BitVec::<N>::from_fn` constructs a bitvector out of a function that takes usizes smaller than `N` and produces bits.
+    pub fn from_fn<F: Fn(u64) -> Bit>(f: F) -> Self {
+        Self(FunArray::from_fn(f))
+    }
+    /// Convert a slice of machine integers where only the `d` least significant bits are relevant.
+    pub fn from_slice<T: Into<i128> + MachineInteger + Copy>(x: &[T], d: u64) -> Self {
+        Self::from_fn(|i| Bit::of_int::<T>(x[(i / d) as usize], (i % d) as u32))
+    }
+
+    /// Construct a BitVec out of a machine integer.
+    pub fn from_int<T: Into<i128> + MachineInteger + Copy>(n: T) -> Self {
+        Self::from_slice::<T>(&[n], T::bits() as u64)
+    }
+
+    /// Convert a BitVec into a machine integer of type `T`.
+    pub fn to_int<T: TryFrom<i128> + MachineInteger + Copy>(self) -> T {
+        int_from_bit_slice(&self.0.as_vec())
+    }
+
+    /// Convert a BitVec into a vector of machine integers of type `T`.
+    pub fn to_vec<T: TryFrom<i128> + MachineInteger + Copy>(&self) -> Vec<T> {
+        self.0
+            .as_vec()
+            .chunks(T::bits() as usize)
+            .map(int_from_bit_slice)
+            .collect()
+    }
+
+    /// Generate a random BitVec.
+    pub fn rand() -> Self {
+        use rand::prelude::*;
+        let random_source: Vec<_> = {
+            let mut rng = rand::rng();
+            (0..N).map(|_| rng.random::<bool>()).collect()
+        };
+        Self::from_fn(|i| random_source[i as usize].into())
+    }
+}
+
+impl<const N: u64> BitVec<N> {
+    pub fn chunked_shift<const CHUNK: u64, const SHIFTS: u64>(
+        self,
+        shl: FunArray<SHIFTS, i128>,
+    ) -> BitVec<N> {
+        fn chunked_shift<const N: u64, const CHUNK: u64, const SHIFTS: u64>(
+            bitvec: BitVec<N>,
+            shl: FunArray<SHIFTS, i128>,
+        ) -> BitVec<N> {
+            BitVec::from_fn(|i| {
+                let nth_bit = i % CHUNK;
+                let nth_chunk = i / CHUNK;
+                let shift: i128 = if nth_chunk < SHIFTS {
+                    shl[nth_chunk]
+                } else {
+                    0
+                };
+                let local_index = (nth_bit as i128).wrapping_sub(shift);
+                if local_index < CHUNK as i128 && local_index >= 0 {
+                    let local_index = local_index as u64;
+                    bitvec[nth_chunk * CHUNK + local_index]
+                } else {
+                    Bit::Zero
+                }
+            })
+        }
+        chunked_shift::<N, CHUNK, SHIFTS>(self, shl)
+    }
+
+    /// Folds over the array, accumulating a result.
+    ///
+    /// # Arguments
+    /// * `init` - The initial value of the accumulator.
+    /// * `f` - A function combining the accumulator and each element.
+    pub fn fold<A>(&self, init: A, f: fn(A, Bit) -> A) -> A {
+        self.0.fold(init, f)
+    }
+}
diff --git a/testable-simd-models/src/abstractions/funarr.rs b/testable-simd-models/src/abstractions/funarr.rs
new file mode 100644
index 0000000000000..4c120addcb0c5
--- /dev/null
+++ b/testable-simd-models/src/abstractions/funarr.rs
@@ -0,0 +1,79 @@
+//! This module implements a fixed-size array wrapper with functional semantics
+//! which are used in formulating abstractions.
+
+/// `FunArray<N, T>` represents an array of `T` values of length `N`, where `N` is a compile-time constant.
+/// Internally, it uses a fixed-length array of `Option<T>` with a maximum capacity of 512 elements.
+/// Unused elements beyond `N` are filled with `None`.
+#[derive(Copy, Clone, Eq, PartialEq)]
+pub struct FunArray<const N: u64, T>([Option<T>; 512]);
+
+impl<const N: u64, T> FunArray<N, T> {
+    /// Gets a reference to the element at index `i`.
+    pub fn get(&self, i: u64) -> &T {
+        self.0[i as usize].as_ref().unwrap()
+    }
+    /// Constructor for FunArray. `FunArray<N,T>::from_fn` constructs a funarray out of a function that takes usizes smaller than `N` and produces an element of type T.
+    pub fn from_fn<F: Fn(u64) -> T>(f: F) -> Self {
+        // let vec = (0..N).map(f).collect();
+        let arr = core::array::from_fn(|i| {
+            if (i as u64) < N {
+                Some(f(i as u64))
+            } else {
+                None
+            }
+        });
+        Self(arr)
+    }
+
+    /// Converts the `FunArray` into a `Vec<T>`.
+    pub fn as_vec(&self) -> Vec<T>
+    where
+        T: Clone,
+    {
+        self.0[0..(N as usize)]
+            .iter()
+            .cloned()
+            .map(|x| x.unwrap())
+            .collect()
+    }
+
+    /// Folds over the array, accumulating a result.
+    ///
+    /// # Arguments
+    /// * `init` - The initial value of the accumulator.
+    /// * `f` - A function combining the accumulator and each element.
+    pub fn fold<A>(&self, mut init: A, f: fn(A, T) -> A) -> A
+    where
+        T: Clone,
+    {
+        for i in 0..N {
+            init = f(init, self[i].clone());
+        }
+        init
+    }
+}
+
+impl<const N: u64, T: Clone> TryFrom<Vec<T>> for FunArray<N, T> {
+    type Error = ();
+    fn try_from(v: Vec<T>) -> Result<Self, ()> {
+        if (v.len() as u64) < N {
+            Err(())
+        } else {
+            Ok(Self::from_fn(|i| v[i as usize].clone()))
+        }
+    }
+}
+
+impl<const N: u64, T: core::fmt::Debug + Clone> core::fmt::Debug for FunArray<N, T> {
+    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
+        write!(f, "{:?}", self.as_vec())
+    }
+}
+
+impl<const N: u64, T> core::ops::Index<u64> for FunArray<N, T> {
+    type Output = T;
+
+    fn index(&self, index: u64) -> &Self::Output {
+        self.get(index)
+    }
+}
diff --git a/testable-simd-models/src/abstractions/mod.rs b/testable-simd-models/src/abstractions/mod.rs
new file mode 100644
index 0000000000000..b3018a8189569
--- /dev/null
+++ b/testable-simd-models/src/abstractions/mod.rs
@@ -0,0 +1,26 @@
+//! This module provides abstractions that are useful for writing
+//! specifications for the intrinsics. Currently it provides two abstractions: bits and
+//! bit vectors.
+//!
+//! # Examples
+//!
+//! Converting an integer to a bit vector and back:
+//!
+//! ```rust
+//! use testable_simd_models::abstractions::{bit::{Bit, MachineInteger}, bitvec::BitVec};
+//!
+//! // Create a BitVec from a machine integer (using the integer's bit-width)
+//! let bv = BitVec::<16>::from_int(42u16);
+//! println!("BitVec: {:?}", bv);
+//!
+//! // Convert the BitVec back into a machine integer
+//! let n: u16 = bv.to_int();
+//! println!("Integer: {}", n);
+//!
+//! assert!(n == 42);
+//! ```
+
+pub mod bit;
+pub mod bitvec;
+pub mod funarr;
+pub mod simd;
diff --git a/testable-simd-models/src/abstractions/simd.rs b/testable-simd-models/src/abstractions/simd.rs
new file mode 100644
index 0000000000000..08b1b21bce34d
--- /dev/null
+++ b/testable-simd-models/src/abstractions/simd.rs
@@ -0,0 +1,938 @@
+//! Models of SIMD compiler intrinsics.
+//!
+//! Operations are defined on FunArrs.
+
+use crate::abstractions::{bit::MachineInteger, bitvec::*, funarr::*};
+use std::convert::*;
+use std::ops::*;
+
+#[allow(dead_code)]
+/// Derives interpretations functions, and type synonyms.
+macro_rules! interpretations {
+($n:literal; $($name:ident [$ty:ty; $m:literal]),*) => {
+        $(
+    #[doc = concat!(stringify!($ty), " vectors of size ", stringify!($m))]
+    #[allow(non_camel_case_types)]
+    pub type $name = FunArray<$m, $ty>;
+    pastey::paste! {
+                const _: ()  = {
+        impl BitVec<$n> {
+                        #[doc = concat!("Conversion from ", stringify!($ty), " vectors of size ", stringify!($m), "to  bit vectors of size ", stringify!($n))]
+                        pub fn [< from_ $name >](iv: $name) -> BitVec<$n> {
+            let vec: Vec<$ty> = iv.as_vec();
+            Self::from_slice(&vec[..], <$ty>::bits() as u64)
+                        }
+                        #[doc = concat!("Conversion from bit vectors of size ", stringify!($n), " to ", stringify!($ty), " vectors of size ", stringify!($m))]
+                        pub fn [< to_ $name >](bv: BitVec<$n>) -> $name {
+            let vec: Vec<$ty> = bv.to_vec();
+            $name::from_fn(|i| vec[i as usize])
+                        }
+
+
+        }
+
+
+        impl From<BitVec<$n>> for $name {
+                        fn from(bv: BitVec<$n>) -> Self {
+            BitVec::[< to_ $name >](bv)
+                        }
+        }
+
+        impl From<$name> for BitVec<$n> {
+                        fn from(iv: $name) -> Self {
+            BitVec::[< from_ $name >](iv)
+                        }
+        }
+
+        impl $name {
+
+            pub fn splat(value: $ty) -> Self {
+            FunArray::from_fn(|_| value)
+            }
+        }
+                };
+    }
+        )*
+};
+}
+
+interpretations!(256; i32x8 [i32; 8], i64x4 [i64; 4], i16x16 [i16; 16], i128x2 [i128; 2], i8x32 [i8; 32],
+            u32x8 [u32; 8], u64x4 [u64; 4], u16x16 [u16; 16], u8x32 [u8; 32]);
+interpretations!(128; i32x4 [i32; 4], i64x2 [i64; 2], i16x8 [i16; 8], i128x1 [i128; 1], i8x16 [i8; 16],
+            u32x4 [u32; 4], u64x2 [u64; 2], u16x8 [u16; 8], u8x16 [u8; 16]);
+
+interpretations!(512; u32x16 [u32; 16], u16x32 [u16; 32], i32x16 [i32; 16], i16x32 [i16; 32]);
+interpretations!(64; i64x1 [i64; 1], i32x2 [i32; 2], i16x4 [i16; 4], i8x8 [i8; 8], u64x1 [u64; 1], u32x2 [u32; 2],u16x4 [u16; 4], u8x8 [u8; 8]);
+interpretations!(32; i8x4 [i8; 4], u8x4 [u8; 4]);
+
+
+/// Inserts an element into a vector, returning the updated vector.
+///
+/// # Safety
+///
+/// `idx` must be in-bounds of the vector, ie. idx < N
+
+pub fn simd_insert<const N: u64, T: Copy>(x: FunArray<N, T>, idx: u64, val: T) -> FunArray<N, T> {
+    FunArray::from_fn(|i| if i == idx { val } else { x[i] })
+}
+
+/// Extracts an element from a vector.
+///
+/// # Safety
+///
+/// `idx` must be in-bounds of the vector, ie. idx < N
+pub fn simd_extract<const N: u64, T: Clone>(x: FunArray<N, T>, idx: u64) -> T {
+    x.get(idx).clone()
+}
+
+/// Adds two vectors elementwise with wrapping on overflow/underflow.
+pub fn simd_add<const N: u64, T: MachineInteger + Copy>(
+    x: FunArray<N, T>,
+    y: FunArray<N, T>,
+) -> FunArray<N, T> {
+    FunArray::from_fn(|i| (x[i].wrapping_add(y[i])))
+}
+
+/// Subtracts `rhs` from `lhs` elementwise with wrapping on overflow/underflow.
+pub fn simd_sub<const N: u64, T: MachineInteger + Copy>(
+    x: FunArray<N, T>,
+    y: FunArray<N, T>,
+) -> FunArray<N, T> {
+    FunArray::from_fn(|i| (x[i].wrapping_sub(y[i])))
+}
+
+/// Multiplies two vectors elementwise with wrapping on overflow/underflow.
+pub fn simd_mul<const N: u64, T: MachineInteger + Copy>(
+    x: FunArray<N, T>,
+    y: FunArray<N, T>,
+) -> FunArray<N, T> {
+    FunArray::from_fn(|i| (x[i].overflowing_mul(y[i])))
+}
+
+/// Produces the elementwise absolute values.
+/// For vectors of unsigned integers it returns the vector untouched.
+/// If the element is the minimum value of a signed integer, it returns the element as is.
+pub fn simd_abs<const N: u64, T: MachineInteger + Copy>(x: FunArray<N, T>) -> FunArray<N, T> {
+    FunArray::from_fn(|i| x[i].absolute_val())
+}
+
+/// Produces the elementwise absolute difference of two vectors.
+/// Note: Absolute difference in this case is simply the element with the smaller value subtracted from the element with the larger value, with overflow/underflow.
+/// For example, if the elements are i8, the absolute difference of 255 and -2 is -255.
+pub fn simd_abs_diff<const N: u64, T: MachineInteger + Copy>(
+    x: FunArray<N, T>,
+    y: FunArray<N, T>,
+) -> FunArray<N, T> {
+    FunArray::from_fn(|i| (x[i].absolute_diff(y[i])))
+}
+
+/// Shifts vector left elementwise, with UB on overflow.
+///
+/// # Safety
+///
+/// Each element of `rhs` must be less than `<int>::BITS`.
+pub fn simd_shl<const N: u64, T: Shl + Copy>(
+    x: FunArray<N, T>,
+    y: FunArray<N, T>,
+) -> FunArray<N, <T as Shl>::Output> {
+    FunArray::from_fn(|i| (x[i] << y[i]))
+}
+
+/// Shifts vector right elementwise, with UB on overflow.
+///
+/// Shifts `lhs` right by `rhs`, shifting in sign bits for signed types.
+///
+/// # Safety
+///
+/// Each element of `rhs` must be less than `<int>::BITS`.
+
+pub fn simd_shr<const N: u64, T: Shr + Copy>(
+    x: FunArray<N, T>,
+    y: FunArray<N, T>,
+) -> FunArray<N, <T as Shr>::Output> {
+    FunArray::from_fn(|i| (x[i] >> y[i]))
+}
+
+/// "Ands" vectors elementwise.
+
+pub fn simd_and<const N: u64, T: BitAnd + Copy>(
+    x: FunArray<N, T>,
+    y: FunArray<N, T>,
+) -> FunArray<N, <T as BitAnd>::Output> {
+    FunArray::from_fn(|i| (x[i] & y[i]))
+}
+
+/// "Ors" vectors elementwise.
+
+pub fn simd_or<const N: u64, T: BitOr + Copy>(
+    x: FunArray<N, T>,
+    y: FunArray<N, T>,
+) -> FunArray<N, <T as BitOr>::Output> {
+    FunArray::from_fn(|i| (x[i] | y[i]))
+}
+
+/// "Exclusive ors" vectors elementwise.
+
+pub fn simd_xor<const N: u64, T: BitXor + Copy>(
+    x: FunArray<N, T>,
+    y: FunArray<N, T>,
+) -> FunArray<N, <T as BitXor>::Output> {
+    FunArray::from_fn(|i| (x[i] ^ y[i]))
+}
+
+pub trait CastsFrom<T> {
+    fn cast(a: T) -> Self;
+}
+pub trait TruncateFrom<T> {
+    /// Truncates into [`Self`] from a larger integer
+    fn truncate_from(v: T) -> Self;
+}
+
+macro_rules! from_impls{
+    ($([$ty1:ty, $ty2: ty]),*) => {
+        $(
+	    impl CastsFrom<$ty2> for $ty1 {
+		fn cast(a: $ty2) -> $ty1 {
+		    <$ty1>::from(a)
+		}
+	    }
+	)*
+    };
+}
+macro_rules! truncate_from_order {
+    ($t:ty, $($from:ty),+) => {
+        $(
+        impl TruncateFrom<$from> for $t {
+            #[inline]
+            fn truncate_from(v: $from) -> $t { v as $t }
+        }
+        )*
+        truncate_from_order!($($from),+);
+    };
+
+    ($t:ty) => {};
+}
+truncate_from_order!(u8, u16, u32, u64, u128);
+truncate_from_order!(i8, i16, i32, i64, i128);
+
+macro_rules! truncate_from_impls{
+    ($([$ty1:ty, $ty2: ty]),*) => {
+        $(
+	    impl CastsFrom<$ty2> for $ty1 {
+		fn cast(a: $ty2) -> $ty1 {
+		    <$ty1>::truncate_from(a)
+		}
+	    }
+	)*
+    };
+}
+
+macro_rules! symm_impls{
+    ($([$ty1:ty, $ty2: ty]),*) => {
+        $(
+	    impl CastsFrom<$ty2> for $ty1 {
+		fn cast(a: $ty2) -> $ty1 {
+		    a as $ty1
+		}
+	    }
+	    impl CastsFrom<$ty1> for $ty2 {
+		fn cast(a: $ty1) -> $ty2 {
+		    a as $ty2
+		}
+	    }
+	)*
+    };
+}
+macro_rules! self_impls{
+    ($($ty1:ty),*) => {
+        $(
+	    impl CastsFrom<$ty1> for $ty1 {
+		fn cast(a: $ty1) -> $ty1 {
+		    a
+		}
+	    }
+
+	)*
+    };
+}
+from_impls!(
+    [u16, u8],
+    [u32, u8],
+    [u32, u16],
+    [u64, u8],
+    [u64, u16],
+    [u64, u32],
+    [u128, u8],
+    [u128, u16],
+    [u128, u32],
+    [u128, u64],
+    [i16, i8],
+    [i32, i8],
+    [i32, i16],
+    [i64, i8],
+    [i64, i16],
+    [i64, i32],
+    [i128, i8],
+    [i128, i16],
+    [i128, i32],
+    [i128, i64]
+);
+truncate_from_impls!(
+    [u8, u16],
+    [u8, u32],
+    [u16, u32],
+    [u8, u64],
+    [u16, u64],
+    [u32, u64],
+    [u8, u128],
+    [u16, u128],
+    [u32, u128],
+    [u64, u128],
+    [i8, i16],
+    [i8, i32],
+    [i16, i32],
+    [i8, i64],
+    [i16, i64],
+    [i32, i64],
+    [i8, i128],
+    [i16, i128],
+    [i32, i128],
+    [i64, i128]
+);
+
+symm_impls!([u8, i8], [u16, i16], [u32, i32], [u64, i64], [u128, i128]);
+
+self_impls!(u8, u16, u32, u64, u128, i8, i16, i32, i64, i128);
+
+// Would like to do the below instead of using the above macros, but currently this is an active issue in Rust (#31844)
+// impl <T,U> CastsFrom<T> for U
+// where
+//     U : From<T> {
+//     fn cast(a: T) -> U {
+// 	U::from(a)
+//     }
+// }
+
+// impl <T,U> CastsFrom<T> for U
+// where
+//     U : TruncateFrom<T> {
+//     fn cast(a: T) -> U {
+// 	U::truncate_from(a)
+//     }
+// }
+
+/// Numerically casts a vector, elementwise.
+///
+/// Casting can only happen between two integers of the same signedness.
+///
+/// When casting from a wider number to a smaller number, the higher bits are removed.
+/// Otherwise, it extends the number, following signedness.
+pub fn simd_cast<const N: u64, T1: Copy, T2: CastsFrom<T1>>(x: FunArray<N, T1>) -> FunArray<N, T2> {
+    FunArray::from_fn(|i| T2::cast(x[i]))
+}
+
+/// Negates a vector elementwise.
+///
+/// Rust panics for `-<int>::Min` due to overflow, but here, it just returns the element as is.
+
+pub fn simd_neg<const N: u64, T: From<<T as Neg>::Output> + MachineInteger + Eq + Neg + Copy>(
+    x: FunArray<N, T>,
+) -> FunArray<N, T> {
+    FunArray::from_fn(|i| {
+        if x[i] == T::MIN {
+            T::MIN
+        } else {
+            T::from(-x[i])
+        }
+    })
+}
+/// Tests elementwise equality of two vectors.
+///
+/// Returns `0` (all zeros) for false and `!0` (all ones) for true.
+
+pub fn simd_eq<const N: u64, T: Eq + MachineInteger + Copy>(
+    x: FunArray<N, T>,
+    y: FunArray<N, T>,
+) -> FunArray<N, T> {
+    FunArray::from_fn(|i| if x[i] == y[i] { T::ONES } else { T::ZEROS })
+}
+
+/// Tests elementwise inequality equality of two vectors.
+///
+/// Returns `0` (all zeros) for false and `!0` (all ones) for true.
+
+pub fn simd_ne<const N: u64, T: Eq + MachineInteger + Copy>(
+    x: FunArray<N, T>,
+    y: FunArray<N, T>,
+) -> FunArray<N, T> {
+    FunArray::from_fn(|i| if x[i] != y[i] { T::ONES } else { T::ZEROS })
+}
+
+/// Tests if `x` is less than `y`, elementwise.
+///
+/// Returns `0` (all zeros) for false and `!0` (all ones) for true.
+
+pub fn simd_lt<const N: u64, T: Ord + MachineInteger + Copy>(
+    x: FunArray<N, T>,
+    y: FunArray<N, T>,
+) -> FunArray<N, T> {
+    FunArray::from_fn(|i| if x[i] < y[i] { T::ONES } else { T::ZEROS })
+}
+
+/// Tests if `x` is less than or equal to `y`, elementwise.
+///
+/// Returns `0` (all zeros) for false and `!0` (all ones) for true.
+
+pub fn simd_le<const N: u64, T: Ord + MachineInteger + Copy>(
+    x: FunArray<N, T>,
+    y: FunArray<N, T>,
+) -> FunArray<N, T> {
+    FunArray::from_fn(|i| if x[i] <= y[i] { T::ONES } else { T::ZEROS })
+}
+
+/// Tests if `x` is greater than `y`, elementwise.
+///
+/// Returns `0` (all zeros) for false and `!0` (all ones) for true.
+
+pub fn simd_gt<const N: u64, T: Ord + MachineInteger + Copy>(
+    x: FunArray<N, T>,
+    y: FunArray<N, T>,
+) -> FunArray<N, T> {
+    FunArray::from_fn(|i| if x[i] > y[i] { T::ONES } else { T::ZEROS })
+}
+
+/// Tests if `x` is greater than or equal to `y`, elementwise.
+///
+/// Returns `0` (all zeros) for false and `!0` (all ones) for true.
+
+pub fn simd_ge<const N: u64, T: Ord + MachineInteger + Copy>(
+    x: FunArray<N, T>,
+    y: FunArray<N, T>,
+) -> FunArray<N, T> {
+    FunArray::from_fn(|i| if x[i] >= y[i] { T::ONES } else { T::ZEROS })
+}
+
+/// Shuffles two vectors by the indices in idx.
+///
+/// For safety, `N2 <= N1 + N3` must hold.
+pub fn simd_shuffle<T: Copy, const N1: u64, const N2: usize, const N3: u64>(
+    x: FunArray<N1, T>,
+    y: FunArray<N1, T>,
+    idx: [u64; N2],
+) -> FunArray<N3, T> {
+    FunArray::from_fn(|i| {
+        let i = idx[i as usize];
+        if i < N1 {
+            x[i]
+        } else {
+            y[i - N1]
+        }
+    })
+}
+
+/// Adds two vectors elementwise, with saturation.
+
+pub fn simd_saturating_add<T: MachineInteger + Copy, const N: u64>(
+    x: FunArray<N, T>,
+    y: FunArray<N, T>,
+) -> FunArray<N, T> {
+    FunArray::from_fn(|i| x[i].saturating_add(y[i]))
+}
+
+/// Subtracts `y` from `x` elementwise, with saturation.
+
+pub fn simd_saturating_sub<T: MachineInteger + Copy, const N: u64>(
+    x: FunArray<N, T>,
+    y: FunArray<N, T>,
+) -> FunArray<N, T> {
+    FunArray::from_fn(|i| x[i].saturating_sub(y[i]))
+}
+
+/// Truncates an integer vector to a bitmask.
+/// Macro for that expands to an expression which is equivalent to truncating an integer vector to a bitmask, as it would on little endian systems.
+///
+/// The macro takes 3 arguments.
+/// The first is the highest index of the vector.
+/// The second is the vector itself, which should just contain `0` and `!0`.
+/// The third is the type to which the truncation happens, which should be atleast as wide as the number of elements in the vector.
+///
+/// Thus for example, to truncate the vector,
+/// `let a : i32 = [!0, 0, 0, 0, 0, 0, 0, 0, !0, !0, 0, 0, 0, 0, !0, 0]`
+/// to u16, you would call,
+/// `simd_bitmask_little!(15, a, u16)`
+/// to get,
+/// `0b0100001100000001u16`
+///
+/// # Safety
+/// The second argument must be a vector of signed integer types.
+/// The length of the vector must be 64 at most.
+
+// The numbers in here are powers of 2. If it is needed to extend the length of the vector, simply add more cases in the same manner.
+// The reason for doing this is that the expression becomes easier to work with when compiled for a proof assistant.
+macro_rules! simd_bitmask_little {
+    (63, $a:ident, $ty:ty) => {
+        9223372036854775808 * ((if $a[63] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(62, $a, $ty)
+    };
+    (62, $a:ident, $ty:ty) => {
+        4611686018427387904 * ((if $a[62] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(61, $a, $ty)
+    };
+    (61, $a:ident, $ty:ty) => {
+        2305843009213693952 * ((if $a[61] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(60, $a, $ty)
+    };
+    (60, $a:ident, $ty:ty) => {
+        1152921504606846976 * ((if $a[60] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(59, $a, $ty)
+    };
+    (59, $a:ident, $ty:ty) => {
+        576460752303423488 * ((if $a[59] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(58, $a, $ty)
+    };
+    (58, $a:ident, $ty:ty) => {
+        288230376151711744 * ((if $a[58] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(57, $a, $ty)
+    };
+    (57, $a:ident, $ty:ty) => {
+        144115188075855872 * ((if $a[57] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(56, $a, $ty)
+    };
+    (56, $a:ident, $ty:ty) => {
+        72057594037927936 * ((if $a[56] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(55, $a, $ty)
+    };
+    (55, $a:ident, $ty:ty) => {
+        36028797018963968 * ((if $a[55] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(54, $a, $ty)
+    };
+    (54, $a:ident, $ty:ty) => {
+        18014398509481984 * ((if $a[54] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(53, $a, $ty)
+    };
+    (53, $a:ident, $ty:ty) => {
+        9007199254740992 * ((if $a[53] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(52, $a, $ty)
+    };
+    (52, $a:ident, $ty:ty) => {
+        4503599627370496 * ((if $a[52] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(51, $a, $ty)
+    };
+    (51, $a:ident, $ty:ty) => {
+        2251799813685248 * ((if $a[51] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(50, $a, $ty)
+    };
+    (50, $a:ident, $ty:ty) => {
+        1125899906842624 * ((if $a[50] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(49, $a, $ty)
+    };
+    (49, $a:ident, $ty:ty) => {
+        562949953421312 * ((if $a[49] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(48, $a, $ty)
+    };
+    (48, $a:ident, $ty:ty) => {
+        281474976710656 * ((if $a[48] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(47, $a, $ty)
+    };
+    (47, $a:ident, $ty:ty) => {
+        140737488355328 * ((if $a[47] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(46, $a, $ty)
+    };
+    (46, $a:ident, $ty:ty) => {
+        70368744177664 * ((if $a[46] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(45, $a, $ty)
+    };
+    (45, $a:ident, $ty:ty) => {
+        35184372088832 * ((if $a[45] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(44, $a, $ty)
+    };
+    (44, $a:ident, $ty:ty) => {
+        17592186044416 * ((if $a[44] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(43, $a, $ty)
+    };
+    (43, $a:ident, $ty:ty) => {
+        8796093022208 * ((if $a[43] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(42, $a, $ty)
+    };
+    (42, $a:ident, $ty:ty) => {
+        4398046511104 * ((if $a[42] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(41, $a, $ty)
+    };
+    (41, $a:ident, $ty:ty) => {
+        2199023255552 * ((if $a[41] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(40, $a, $ty)
+    };
+    (40, $a:ident, $ty:ty) => {
+        1099511627776 * ((if $a[40] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_little!(39, $a, $ty)
+    };
+    (39, $a:ident, $ty:ty) => {
+        549755813888 * ((if $a[39] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(38, $a, $ty)
+    };
+    (38, $a:ident, $ty:ty) => {
+        274877906944 * ((if $a[38] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(37, $a, $ty)
+    };
+    (37, $a:ident, $ty:ty) => {
+        137438953472 * ((if $a[37] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(36, $a, $ty)
+    };
+    (36, $a:ident, $ty:ty) => {
+        68719476736 * ((if $a[36] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(35, $a, $ty)
+    };
+    (35, $a:ident, $ty:ty) => {
+        34359738368 * ((if $a[35] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(34, $a, $ty)
+    };
+    (34, $a:ident, $ty:ty) => {
+        17179869184 * ((if $a[34] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(33, $a, $ty)
+    };
+    (33, $a:ident, $ty:ty) => {
+        8589934592 * ((if $a[33] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(32, $a, $ty)
+    };
+    (32, $a:ident, $ty:ty) => {
+        4294967296 * ((if $a[32] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(31, $a, $ty)
+    };
+    (31, $a:ident, $ty:ty) => {
+        2147483648 * ((if $a[31] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(30, $a, $ty)
+    };
+    (30, $a:ident, $ty:ty) => {
+        1073741824 * ((if $a[30] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(29, $a, $ty)
+    };
+    (29, $a:ident, $ty:ty) => {
+        536870912 * ((if $a[29] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(28, $a, $ty)
+    };
+    (28, $a:ident, $ty:ty) => {
+        268435456 * ((if $a[28] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(27, $a, $ty)
+    };
+    (27, $a:ident, $ty:ty) => {
+        134217728 * ((if $a[27] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(26, $a, $ty)
+    };
+    (26, $a:ident, $ty:ty) => {
+        67108864 * ((if $a[26] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(25, $a, $ty)
+    };
+    (25, $a:ident, $ty:ty) => {
+        33554432 * ((if $a[25] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(24, $a, $ty)
+    };
+    (24, $a:ident, $ty:ty) => {
+        16777216 * ((if $a[24] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(23, $a, $ty)
+    };
+    (23, $a:ident, $ty:ty) => {
+        8388608 * ((if $a[23] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(22, $a, $ty)
+    };
+    (22, $a:ident, $ty:ty) => {
+        4194304 * ((if $a[22] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(21, $a, $ty)
+    };
+    (21, $a:ident, $ty:ty) => {
+        2097152 * ((if $a[21] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(20, $a, $ty)
+    };
+    (20, $a:ident, $ty:ty) => {
+        1048576 * ((if $a[20] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(19, $a, $ty)
+    };
+    (19, $a:ident, $ty:ty) => {
+        524288 * ((if $a[19] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(18, $a, $ty)
+    };
+    (18, $a:ident, $ty:ty) => {
+        262144 * ((if $a[18] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(17, $a, $ty)
+    };
+    (17, $a:ident, $ty:ty) => {
+        131072 * ((if $a[17] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(16, $a, $ty)
+    };
+    (16, $a:ident, $ty:ty) => {
+        65536 * ((if $a[16] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(15, $a, $ty)
+    };
+    (15, $a:ident, $ty:ty) => {
+        32768 * ((if $a[15] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(14, $a, $ty)
+    };
+    (14, $a:ident, $ty:ty) => {
+        16384 * ((if $a[14] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(13, $a, $ty)
+    };
+    (13, $a:ident, $ty:ty) => {
+        8192 * ((if $a[13] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(12, $a, $ty)
+    };
+    (12, $a:ident, $ty:ty) => {
+        4096 * ((if $a[12] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(11, $a, $ty)
+    };
+    (11, $a:ident, $ty:ty) => {
+        2048 * ((if $a[11] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(10, $a, $ty)
+    };
+    (10, $a:ident, $ty:ty) => {
+        1024 * ((if $a[10] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(9, $a, $ty)
+    };
+    (9, $a:ident, $ty:ty) => {
+        512 * ((if $a[9] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(8, $a, $ty)
+    };
+    (8, $a:ident, $ty:ty) => {
+        256 * ((if $a[8] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(7, $a, $ty)
+    };
+    (7, $a:ident, $ty:ty) => {
+        128 * ((if $a[7] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(6, $a, $ty)
+    };
+    (6, $a:ident, $ty:ty) => {
+        64 * ((if $a[6] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(5, $a, $ty)
+    };
+    (5, $a:ident, $ty:ty) => {
+        32 * ((if $a[5] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(4, $a, $ty)
+    };
+    (4, $a:ident, $ty:ty) => {
+        16 * ((if $a[4] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(3, $a, $ty)
+    };
+    (3, $a:ident, $ty:ty) => {
+        8 * ((if $a[3] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(2, $a, $ty)
+    };
+    (2, $a:ident, $ty:ty) => {
+        4 * ((if $a[2] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(1, $a, $ty)
+    };
+    (1, $a:ident, $ty:ty) => {
+        2 * ((if $a[1] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_little!(0, $a, $ty)
+    };
+    (0, $a:ident, $ty:ty) => {
+        ((if $a[0] < 0 { 1 } else { 0 }) as $ty)
+    };
+}
+pub(crate) use simd_bitmask_little;
+
+/// Truncates an integer vector to a bitmask.
+/// Macro for that expands to an expression which is equivalent to truncating an integer vector to a bitmask, as it would on big endian systems.
+///
+/// The macro takes 3 arguments.
+/// The first is the highest index of the vector.
+/// The second is the vector itself, which should just contain `0` and `!0`.
+/// The third is the type to which the truncation happens, which should be atleast as wide as the number of elements in the vector.
+///
+/// Thus for example, to truncate the vector,
+/// `let a : i32 = [!0, 0, 0, 0, 0, 0, 0, 0, !0, !0, 0, 0, 0, 0, !0, 0]`
+/// to u16, you would call,
+/// `simd_bitmask_big!(15, a, u16)`
+/// to get,
+/// `0b1000000011000010u16`
+///
+/// # Safety
+/// The second argument must be a vector of signed integer types.
+
+#[allow(unused)]
+macro_rules! simd_bitmask_big {
+    (63, $a:ident, $ty:ty) => {
+        1 * ((if $a[63] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(62, $a, $ty)
+    };
+    (62, $a:ident, $ty:ty) => {
+        2 * ((if $a[62] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(61, $a, $ty)
+    };
+    (61, $a:ident, $ty:ty) => {
+        4 * ((if $a[61] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(60, $a, $ty)
+    };
+    (60, $a:ident, $ty:ty) => {
+        8 * ((if $a[60] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(59, $a, $ty)
+    };
+    (59, $a:ident, $ty:ty) => {
+        16 * ((if $a[59] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(58, $a, $ty)
+    };
+    (58, $a:ident, $ty:ty) => {
+        32 * ((if $a[58] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(57, $a, $ty)
+    };
+    (57, $a:ident, $ty:ty) => {
+        64 * ((if $a[57] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(56, $a, $ty)
+    };
+    (56, $a:ident, $ty:ty) => {
+        128 * ((if $a[56] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(55, $a, $ty)
+    };
+    (55, $a:ident, $ty:ty) => {
+        256 * ((if $a[55] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(54, $a, $ty)
+    };
+    (54, $a:ident, $ty:ty) => {
+        512 * ((if $a[54] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(53, $a, $ty)
+    };
+    (53, $a:ident, $ty:ty) => {
+        1024 * ((if $a[53] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(52, $a, $ty)
+    };
+    (52, $a:ident, $ty:ty) => {
+        2048 * ((if $a[52] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(51, $a, $ty)
+    };
+    (51, $a:ident, $ty:ty) => {
+        4096 * ((if $a[51] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(50, $a, $ty)
+    };
+    (50, $a:ident, $ty:ty) => {
+        8192 * ((if $a[50] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(49, $a, $ty)
+    };
+    (49, $a:ident, $ty:ty) => {
+        16384 * ((if $a[49] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(48, $a, $ty)
+    };
+    (48, $a:ident, $ty:ty) => {
+        32768 * ((if $a[48] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(47, $a, $ty)
+    };
+    (47, $a:ident, $ty:ty) => {
+        65536 * ((if $a[47] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(46, $a, $ty)
+    };
+    (46, $a:ident, $ty:ty) => {
+        131072 * ((if $a[46] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(45, $a, $ty)
+    };
+    (45, $a:ident, $ty:ty) => {
+        262144 * ((if $a[45] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(44, $a, $ty)
+    };
+    (44, $a:ident, $ty:ty) => {
+        524288 * ((if $a[44] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(43, $a, $ty)
+    };
+    (43, $a:ident, $ty:ty) => {
+        1048576 * ((if $a[43] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(42, $a, $ty)
+    };
+    (42, $a:ident, $ty:ty) => {
+        2097152 * ((if $a[42] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(41, $a, $ty)
+    };
+    (41, $a:ident, $ty:ty) => {
+        4194304 * ((if $a[41] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(40, $a, $ty)
+    };
+    (40, $a:ident, $ty:ty) => {
+        8388608 * ((if $a[40] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(39, $a, $ty)
+    };
+    (39, $a:ident, $ty:ty) => {
+        16777216 * ((if $a[39] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(38, $a, $ty)
+    };
+    (38, $a:ident, $ty:ty) => {
+        33554432 * ((if $a[38] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(37, $a, $ty)
+    };
+    (37, $a:ident, $ty:ty) => {
+        67108864 * ((if $a[37] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(36, $a, $ty)
+    };
+    (36, $a:ident, $ty:ty) => {
+        134217728 * ((if $a[36] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(35, $a, $ty)
+    };
+    (35, $a:ident, $ty:ty) => {
+        268435456 * ((if $a[35] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(34, $a, $ty)
+    };
+    (34, $a:ident, $ty:ty) => {
+        536870912 * ((if $a[34] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(33, $a, $ty)
+    };
+    (33, $a:ident, $ty:ty) => {
+        1073741824 * ((if $a[33] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(32, $a, $ty)
+    };
+    (32, $a:ident, $ty:ty) => {
+        2147483648 * ((if $a[32] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(31, $a, $ty)
+    };
+    (31, $a:ident, $ty:ty) => {
+        4294967296 * ((if $a[31] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(30, $a, $ty)
+    };
+    (30, $a:ident, $ty:ty) => {
+        8589934592 * ((if $a[30] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(29, $a, $ty)
+    };
+    (29, $a:ident, $ty:ty) => {
+        17179869184 * ((if $a[29] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(28, $a, $ty)
+    };
+    (28, $a:ident, $ty:ty) => {
+        34359738368 * ((if $a[28] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(27, $a, $ty)
+    };
+    (27, $a:ident, $ty:ty) => {
+        68719476736 * ((if $a[27] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(26, $a, $ty)
+    };
+    (26, $a:ident, $ty:ty) => {
+        137438953472 * ((if $a[26] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(25, $a, $ty)
+    };
+    (25, $a:ident, $ty:ty) => {
+        274877906944 * ((if $a[25] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(24, $a, $ty)
+    };
+    (24, $a:ident, $ty:ty) => {
+        549755813888 * ((if $a[24] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(23, $a, $ty)
+    };
+    (23, $a:ident, $ty:ty) => {
+        1099511627776 * ((if $a[23] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(22, $a, $ty)
+    };
+    (22, $a:ident, $ty:ty) => {
+        2199023255552 * ((if $a[22] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(21, $a, $ty)
+    };
+    (21, $a:ident, $ty:ty) => {
+        4398046511104 * ((if $a[21] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(20, $a, $ty)
+    };
+    (20, $a:ident, $ty:ty) => {
+        8796093022208 * ((if $a[20] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(19, $a, $ty)
+    };
+    (19, $a:ident, $ty:ty) => {
+        17592186044416 * ((if $a[19] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(18, $a, $ty)
+    };
+    (18, $a:ident, $ty:ty) => {
+        35184372088832 * ((if $a[18] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(17, $a, $ty)
+    };
+    (17, $a:ident, $ty:ty) => {
+        70368744177664 * ((if $a[17] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(16, $a, $ty)
+    };
+    (16, $a:ident, $ty:ty) => {
+        140737488355328 * ((if $a[16] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(15, $a, $ty)
+    };
+    (15, $a:ident, $ty:ty) => {
+        281474976710656 * ((if $a[15] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(14, $a, $ty)
+    };
+    (14, $a:ident, $ty:ty) => {
+        562949953421312 * ((if $a[14] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(13, $a, $ty)
+    };
+    (13, $a:ident, $ty:ty) => {
+        1125899906842624 * ((if $a[13] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_big!(12, $a, $ty)
+    };
+    (12, $a:ident, $ty:ty) => {
+        2251799813685248 * ((if $a[12] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_big!(11, $a, $ty)
+    };
+    (11, $a:ident, $ty:ty) => {
+        4503599627370496 * ((if $a[11] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_big!(10, $a, $ty)
+    };
+    (10, $a:ident, $ty:ty) => {
+        9007199254740992 * ((if $a[10] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(9, $a, $ty)
+    };
+    (9, $a:ident, $ty:ty) => {
+        18014398509481984 * ((if $a[9] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(8, $a, $ty)
+    };
+    (8, $a:ident, $ty:ty) => {
+        36028797018963968 * ((if $a[8] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(7, $a, $ty)
+    };
+    (7, $a:ident, $ty:ty) => {
+        72057594037927936 * ((if $a[7] < 0 { 1 } else { 0 }) as $ty) + simd_bitmask_big!(6, $a, $ty)
+    };
+    (6, $a:ident, $ty:ty) => {
+        144115188075855872 * ((if $a[6] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_big!(5, $a, $ty)
+    };
+    (5, $a:ident, $ty:ty) => {
+        288230376151711744 * ((if $a[5] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_big!(4, $a, $ty)
+    };
+    (4, $a:ident, $ty:ty) => {
+        576460752303423488 * ((if $a[4] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_big!(3, $a, $ty)
+    };
+    (3, $a:ident, $ty:ty) => {
+        1152921504606846976 * ((if $a[3] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_big!(2, $a, $ty)
+    };
+    (2, $a:ident, $ty:ty) => {
+        2305843009213693952 * ((if $a[2] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_big!(1, $a, $ty)
+    };
+    (1, $a:ident, $ty:ty) => {
+        4611686018427387904 * ((if $a[1] < 0 { 1 } else { 0 }) as $ty)
+            + simd_bitmask_big!(0, $a, $ty)
+    };
+    (0, $a:ident, $ty:ty) => {
+        9223372036854775808 * ((if $a[0] < 0 { 1 } else { 0 }) as $ty)
+    };
+}
+#[allow(unused)]
+pub(crate) use simd_bitmask_big;
+
+/// Selects elements from a mask.
+///
+/// For each element, if the corresponding value in `mask` is `!0`, select the element from
+/// `if_true`.  If the corresponding value in `mask` is `0`, select the element from
+/// `if_false`.
+///
+/// # Safety
+/// `mask` must only contain `0` and `!0`.
+
+pub fn simd_select<const N: u64, T1: Eq + MachineInteger, T2: Copy + MachineInteger>(
+    mask: FunArray<N, T1>,
+    if_true: FunArray<N, T2>,
+    if_false: FunArray<N, T2>,
+) -> FunArray<N, T2> {
+    FunArray::from_fn(|i| {
+        if mask[i] == T1::ONES {
+            if_true[i]
+        } else {
+            if_false[i]
+        }
+    })
+}
diff --git a/testable-simd-models/src/core_arch.rs b/testable-simd-models/src/core_arch.rs
new file mode 100644
index 0000000000000..19e643885f4ce
--- /dev/null
+++ b/testable-simd-models/src/core_arch.rs
@@ -0,0 +1,5 @@
+/// This is a (partial) mirror of [`core::arch`]
+pub mod x86;
+pub use x86 as x86_64;
+
+pub mod arm_shared;
diff --git a/testable-simd-models/src/core_arch/arm_shared/mod.rs b/testable-simd-models/src/core_arch/arm_shared/mod.rs
new file mode 100644
index 0000000000000..6e2272ec0e50a
--- /dev/null
+++ b/testable-simd-models/src/core_arch/arm_shared/mod.rs
@@ -0,0 +1,4 @@
+pub mod models;
+#[cfg(test)]
+#[cfg(any(target_arch = "arm", target_arch = "aarch64"))]
+pub mod tests;
diff --git a/testable-simd-models/src/core_arch/arm_shared/models/mod.rs b/testable-simd-models/src/core_arch/arm_shared/models/mod.rs
new file mode 100644
index 0000000000000..fb7844c6d0441
--- /dev/null
+++ b/testable-simd-models/src/core_arch/arm_shared/models/mod.rs
@@ -0,0 +1,44 @@
+//! Rust models for ARM intrinsics.
+//!
+//! This module contains models for the intrinsics as they are defined in the Rust core.
+//! Since this is supposed to model the Rust core, the implemented functions must
+//! mirror the Rust implementations as closely as they can.
+//!
+//! For example, calls to simd functions like simd_add and simd_sub are left as is,
+//! with their implementations defined in `crate::abstractions::simd`. Some other
+//! operations like simd_cast or simd_shuffle might need a little modification
+//! for correct compilation.
+//!
+//! Calls to transmute are replaced with either an explicit call to a `BitVec::from_ function`,
+//! or with `.into()`.
+//!
+//! Sometimes, an intrinsic in Rust is implemented by directly using the corresponding
+//! LLVM instruction via an `unsafe extern "C"` module. In those cases, the corresponding
+//! function is defined in the `c_extern` module in each file, which contain manually
+//! written implementations made by consulting the appropriate Intel documentation.
+//!
+//! In general, it is best to gain an idea of how an implementation should be written by looking
+//! at how other functions are implemented. Also see `core::arch::arm` for [reference](https://github.com/rust-lang/stdarch/tree/master/crates/core_arch).
+#![allow(unused)]
+#[allow(non_camel_case_types)]
+mod types {
+    use crate::abstractions::simd::*;
+    pub type int32x4_t = i32x4;
+    pub type int64x1_t = i64x1;
+    pub type int64x2_t = i64x2;
+    pub type int16x8_t = i16x8;
+    pub type int8x16_t = i8x16;
+    pub type uint32x4_t = u32x4;
+    pub type uint64x1_t = u64x1;
+    pub type uint64x2_t = u64x2;
+    pub type uint16x8_t = u16x8;
+    pub type uint8x16_t = u8x16;
+    pub type int32x2_t = i32x2;
+    pub type int16x4_t = i16x4;
+    pub type int8x8_t = i8x8;
+    pub type uint32x2_t = u32x2;
+    pub type uint16x4_t = u16x4;
+    pub type uint8x8_t = u8x8;
+}
+
+pub mod neon;
diff --git a/testable-simd-models/src/core_arch/arm_shared/models/neon.rs b/testable-simd-models/src/core_arch/arm_shared/models/neon.rs
new file mode 100644
index 0000000000000..794fd25285b47
--- /dev/null
+++ b/testable-simd-models/src/core_arch/arm_shared/models/neon.rs
@@ -0,0 +1,873 @@
+use super::types::*;
+use crate::abstractions::simd::*;
+
+pub fn vaba_s16(a: int16x4_t, b: int16x4_t, c: int16x4_t) -> int16x4_t {
+    simd_add(a, vabd_s16(b, c))
+}
+
+pub fn vaba_s32(a: int32x2_t, b: int32x2_t, c: int32x2_t) -> int32x2_t {
+    simd_add(a, vabd_s32(b, c))
+}
+
+pub fn vaba_s8(a: int8x8_t, b: int8x8_t, c: int8x8_t) -> int8x8_t {
+    simd_add(a, vabd_s8(b, c))
+}
+
+pub fn vaba_u16(a: uint16x4_t, b: uint16x4_t, c: uint16x4_t) -> uint16x4_t {
+    simd_add(a, vabd_u16(b, c))
+}
+
+pub fn vaba_u32(a: uint32x2_t, b: uint32x2_t, c: uint32x2_t) -> uint32x2_t {
+    simd_add(a, vabd_u32(b, c))
+}
+
+pub fn vaba_u8(a: uint8x8_t, b: uint8x8_t, c: uint8x8_t) -> uint8x8_t {
+    simd_add(a, vabd_u8(b, c))
+}
+
+pub fn vabal_u8(a: uint16x8_t, b: uint8x8_t, c: uint8x8_t) -> uint16x8_t {
+    let d: uint8x8_t = vabd_u8(b, c);
+    simd_add(a, simd_cast(d))
+}
+
+pub fn vabal_u16(a: uint32x4_t, b: uint16x4_t, c: uint16x4_t) -> uint32x4_t {
+    let d: uint16x4_t = vabd_u16(b, c);
+    simd_add(a, simd_cast(d))
+}
+
+pub fn vabal_u32(a: uint64x2_t, b: uint32x2_t, c: uint32x2_t) -> uint64x2_t {
+    let d: uint32x2_t = vabd_u32(b, c);
+    simd_add(a, simd_cast(d))
+}
+
+pub fn vabaq_s16(a: int16x8_t, b: int16x8_t, c: int16x8_t) -> int16x8_t {
+    simd_add(a, vabdq_s16(b, c))
+}
+
+pub fn vabaq_s32(a: int32x4_t, b: int32x4_t, c: int32x4_t) -> int32x4_t {
+    simd_add(a, vabdq_s32(b, c))
+}
+
+pub fn vabaq_s8(a: int8x16_t, b: int8x16_t, c: int8x16_t) -> int8x16_t {
+    simd_add(a, vabdq_s8(b, c))
+}
+
+pub fn vabaq_u16(a: uint16x8_t, b: uint16x8_t, c: uint16x8_t) -> uint16x8_t {
+    simd_add(a, vabdq_u16(b, c))
+}
+
+pub fn vabaq_u32(a: uint32x4_t, b: uint32x4_t, c: uint32x4_t) -> uint32x4_t {
+    simd_add(a, vabdq_u32(b, c))
+}
+
+pub fn vabaq_u8(a: uint8x16_t, b: uint8x16_t, c: uint8x16_t) -> uint8x16_t {
+    simd_add(a, vabdq_u8(b, c))
+}
+
+pub fn vabd_s8(a: int8x8_t, b: int8x8_t) -> int8x8_t {
+    simd_abs_diff(a, b)
+}
+
+pub fn vabdq_s8(a: int8x16_t, b: int8x16_t) -> int8x16_t {
+    simd_abs_diff(a, b)
+}
+
+pub fn vabd_s16(a: int16x4_t, b: int16x4_t) -> int16x4_t {
+    simd_abs_diff(a, b)
+}
+
+pub fn vabdq_s16(a: int16x8_t, b: int16x8_t) -> int16x8_t {
+    simd_abs_diff(a, b)
+}
+
+pub fn vabd_s32(a: int32x2_t, b: int32x2_t) -> int32x2_t {
+    simd_abs_diff(a, b)
+}
+
+pub fn vabdq_s32(a: int32x4_t, b: int32x4_t) -> int32x4_t {
+    simd_abs_diff(a, b)
+}
+
+pub fn vabd_u8(a: uint8x8_t, b: uint8x8_t) -> uint8x8_t {
+    simd_abs_diff(a, b)
+}
+
+pub fn vabdq_u8(a: uint8x16_t, b: uint8x16_t) -> uint8x16_t {
+    simd_abs_diff(a, b)
+}
+
+pub fn vabd_u16(a: uint16x4_t, b: uint16x4_t) -> uint16x4_t {
+    simd_abs_diff(a, b)
+}
+
+pub fn vabdq_u16(a: uint16x8_t, b: uint16x8_t) -> uint16x8_t {
+    simd_abs_diff(a, b)
+}
+
+pub fn vabd_u32(a: uint32x2_t, b: uint32x2_t) -> uint32x2_t {
+    simd_abs_diff(a, b)
+}
+
+pub fn vabdq_u32(a: uint32x4_t, b: uint32x4_t) -> uint32x4_t {
+    simd_abs_diff(a, b)
+}
+
+pub fn vabdl_u8(a: uint8x8_t, b: uint8x8_t) -> uint16x8_t {
+    simd_cast(vabd_u8(a, b))
+}
+
+pub fn vabdl_u16(a: uint16x4_t, b: uint16x4_t) -> uint32x4_t {
+    simd_cast(vabd_u16(a, b))
+}
+
+pub fn vabdl_u32(a: uint32x2_t, b: uint32x2_t) -> uint64x2_t {
+    simd_cast(vabd_u32(a, b))
+}
+
+pub fn vabs_s8(a: int8x8_t) -> int8x8_t {
+    simd_abs(a)
+}
+
+pub fn vabsq_s8(a: int8x16_t) -> int8x16_t {
+    simd_abs(a)
+}
+
+pub fn vabs_s16(a: int16x4_t) -> int16x4_t {
+    simd_abs(a)
+}
+
+pub fn vabsq_s16(a: int16x8_t) -> int16x8_t {
+    simd_abs(a)
+}
+
+pub fn vabs_s32(a: int32x2_t) -> int32x2_t {
+    simd_abs(a)
+}
+
+pub fn vabsq_s32(a: int32x4_t) -> int32x4_t {
+    simd_abs(a)
+}
+
+pub fn vadd_s16(a: int16x4_t, b: int16x4_t) -> int16x4_t {
+    simd_add(a, b)
+}
+
+pub fn vadd_s32(a: int32x2_t, b: int32x2_t) -> int32x2_t {
+    simd_add(a, b)
+}
+
+pub fn vadd_s8(a: int8x8_t, b: int8x8_t) -> int8x8_t {
+    simd_add(a, b)
+}
+
+pub fn vadd_u16(a: uint16x4_t, b: uint16x4_t) -> uint16x4_t {
+    simd_add(a, b)
+}
+
+pub fn vadd_u32(a: uint32x2_t, b: uint32x2_t) -> uint32x2_t {
+    simd_add(a, b)
+}
+
+pub fn vadd_u8(a: uint8x8_t, b: uint8x8_t) -> uint8x8_t {
+    simd_add(a, b)
+}
+
+pub fn vaddq_s16(a: int16x8_t, b: int16x8_t) -> int16x8_t {
+    simd_add(a, b)
+}
+
+pub fn vaddq_s32(a: int32x4_t, b: int32x4_t) -> int32x4_t {
+    simd_add(a, b)
+}
+
+pub fn vaddq_s64(a: int64x2_t, b: int64x2_t) -> int64x2_t {
+    simd_add(a, b)
+}
+
+pub fn vaddq_s8(a: int8x16_t, b: int8x16_t) -> int8x16_t {
+    simd_add(a, b)
+}
+
+pub fn vaddq_u16(a: uint16x8_t, b: uint16x8_t) -> uint16x8_t {
+    simd_add(a, b)
+}
+
+pub fn vaddq_u32(a: uint32x4_t, b: uint32x4_t) -> uint32x4_t {
+    simd_add(a, b)
+}
+
+pub fn vaddq_u64(a: uint64x2_t, b: uint64x2_t) -> uint64x2_t {
+    simd_add(a, b)
+}
+
+pub fn vaddq_u8(a: uint8x16_t, b: uint8x16_t) -> uint8x16_t {
+    simd_add(a, b)
+}
+
+pub fn vaddhn_high_s16(r: int8x8_t, a: int16x8_t, b: int16x8_t) -> int8x16_t {
+    let x = simd_cast(simd_shr(simd_add(a, b), int16x8_t::splat(8)));
+    simd_shuffle(r, x, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
+}
+
+pub fn vaddhn_high_s32(r: int16x4_t, a: int32x4_t, b: int32x4_t) -> int16x8_t {
+    let x = simd_cast(simd_shr(simd_add(a, b), int32x4_t::splat(16)));
+    simd_shuffle(r, x, [0, 1, 2, 3, 4, 5, 6, 7])
+}
+
+pub fn vaddhn_high_s64(r: int32x2_t, a: int64x2_t, b: int64x2_t) -> int32x4_t {
+    let x = simd_cast(simd_shr(simd_add(a, b), int64x2_t::splat(32)));
+    simd_shuffle(r, x, [0, 1, 2, 3])
+}
+
+pub fn vaddhn_high_u16(r: uint8x8_t, a: uint16x8_t, b: uint16x8_t) -> uint8x16_t {
+    let x = simd_cast(simd_shr(simd_add(a, b), uint16x8_t::splat(8)));
+    simd_shuffle(r, x, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
+}
+
+pub fn vaddhn_high_u32(r: uint16x4_t, a: uint32x4_t, b: uint32x4_t) -> uint16x8_t {
+    let x = simd_cast(simd_shr(simd_add(a, b), uint32x4_t::splat(16)));
+    simd_shuffle(r, x, [0, 1, 2, 3, 4, 5, 6, 7])
+}
+
+pub fn vaddhn_high_u64(r: uint32x2_t, a: uint64x2_t, b: uint64x2_t) -> uint32x4_t {
+    let x = simd_cast(simd_shr(simd_add(a, b), uint64x2_t::splat(32)));
+    simd_shuffle(r, x, [0, 1, 2, 3])
+}
+
+pub fn vaddhn_s16(a: int16x8_t, b: int16x8_t) -> int8x8_t {
+    simd_cast(simd_shr(simd_add(a, b), int16x8_t::splat(8)))
+}
+
+pub fn vaddhn_s32(a: int32x4_t, b: int32x4_t) -> int16x4_t {
+    simd_cast(simd_shr(simd_add(a, b), int32x4_t::splat(16)))
+}
+
+pub fn vaddhn_s64(a: int64x2_t, b: int64x2_t) -> int32x2_t {
+    simd_cast(simd_shr(simd_add(a, b), int64x2_t::splat(32)))
+}
+
+pub fn vaddhn_u16(a: uint16x8_t, b: uint16x8_t) -> uint8x8_t {
+    simd_cast(simd_shr(simd_add(a, b), uint16x8_t::splat(8)))
+}
+
+pub fn vaddhn_u32(a: uint32x4_t, b: uint32x4_t) -> uint16x4_t {
+    simd_cast(simd_shr(simd_add(a, b), uint32x4_t::splat(16)))
+}
+
+pub fn vaddhn_u64(a: uint64x2_t, b: uint64x2_t) -> uint32x2_t {
+    simd_cast(simd_shr(simd_add(a, b), uint64x2_t::splat(32)))
+}
+
+pub fn vaddl_high_s16(a: int16x8_t, b: int16x8_t) -> int32x4_t {
+    let a: int16x4_t = simd_shuffle(a, a, [4, 5, 6, 7]);
+    let b: int16x4_t = simd_shuffle(b, b, [4, 5, 6, 7]);
+    let a: int32x4_t = simd_cast(a);
+    let b: int32x4_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddl_high_s32(a: int32x4_t, b: int32x4_t) -> int64x2_t {
+    let a: int32x2_t = simd_shuffle(a, a, [2, 3]);
+    let b: int32x2_t = simd_shuffle(b, b, [2, 3]);
+    let a: int64x2_t = simd_cast(a);
+    let b: int64x2_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddl_high_s8(a: int8x16_t, b: int8x16_t) -> int16x8_t {
+    let a: int8x8_t = simd_shuffle(a, a, [8, 9, 10, 11, 12, 13, 14, 15]);
+    let b: int8x8_t = simd_shuffle(b, b, [8, 9, 10, 11, 12, 13, 14, 15]);
+    let a: int16x8_t = simd_cast(a);
+    let b: int16x8_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddl_high_u16(a: uint16x8_t, b: uint16x8_t) -> uint32x4_t {
+    let a: uint16x4_t = simd_shuffle(a, a, [4, 5, 6, 7]);
+    let b: uint16x4_t = simd_shuffle(b, b, [4, 5, 6, 7]);
+    let a: uint32x4_t = simd_cast(a);
+    let b: uint32x4_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddl_high_u32(a: uint32x4_t, b: uint32x4_t) -> uint64x2_t {
+    let a: uint32x2_t = simd_shuffle(a, a, [2, 3]);
+    let b: uint32x2_t = simd_shuffle(b, b, [2, 3]);
+    let a: uint64x2_t = simd_cast(a);
+    let b: uint64x2_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddl_high_u8(a: uint8x16_t, b: uint8x16_t) -> uint16x8_t {
+    let a: uint8x8_t = simd_shuffle(a, a, [8, 9, 10, 11, 12, 13, 14, 15]);
+    let b: uint8x8_t = simd_shuffle(b, b, [8, 9, 10, 11, 12, 13, 14, 15]);
+    let a: uint16x8_t = simd_cast(a);
+    let b: uint16x8_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddl_s16(a: int16x4_t, b: int16x4_t) -> int32x4_t {
+    let a: int32x4_t = simd_cast(a);
+    let b: int32x4_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddl_s32(a: int32x2_t, b: int32x2_t) -> int64x2_t {
+    let a: int64x2_t = simd_cast(a);
+    let b: int64x2_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddl_s8(a: int8x8_t, b: int8x8_t) -> int16x8_t {
+    let a: int16x8_t = simd_cast(a);
+    let b: int16x8_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddl_u16(a: uint16x4_t, b: uint16x4_t) -> uint32x4_t {
+    let a: uint32x4_t = simd_cast(a);
+    let b: uint32x4_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddl_u32(a: uint32x2_t, b: uint32x2_t) -> uint64x2_t {
+    let a: uint64x2_t = simd_cast(a);
+    let b: uint64x2_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddl_u8(a: uint8x8_t, b: uint8x8_t) -> uint16x8_t {
+    let a: uint16x8_t = simd_cast(a);
+    let b: uint16x8_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddw_high_s16(a: int32x4_t, b: int16x8_t) -> int32x4_t {
+    let b: int16x4_t = simd_shuffle(b, b, [4, 5, 6, 7]);
+    let b: int32x4_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddw_high_s32(a: int64x2_t, b: int32x4_t) -> int64x2_t {
+    let b: int32x2_t = simd_shuffle(b, b, [2, 3]);
+    let b: int64x2_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddw_high_s8(a: int16x8_t, b: int8x16_t) -> int16x8_t {
+    let b: int8x8_t = simd_shuffle(b, b, [8, 9, 10, 11, 12, 13, 14, 15]);
+    let b: int16x8_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddw_high_u16(a: uint32x4_t, b: uint16x8_t) -> uint32x4_t {
+    let b: uint16x4_t = simd_shuffle(b, b, [4, 5, 6, 7]);
+    let b: uint32x4_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddw_high_u32(a: uint64x2_t, b: uint32x4_t) -> uint64x2_t {
+    let b: uint32x2_t = simd_shuffle(b, b, [2, 3]);
+    let b: uint64x2_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddw_high_u8(a: uint16x8_t, b: uint8x16_t) -> uint16x8_t {
+    let b: uint8x8_t = simd_shuffle(b, b, [8, 9, 10, 11, 12, 13, 14, 15]);
+    let b: uint16x8_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddw_s16(a: int32x4_t, b: int16x4_t) -> int32x4_t {
+    let b: int32x4_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddw_s32(a: int64x2_t, b: int32x2_t) -> int64x2_t {
+    let b: int64x2_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddw_s8(a: int16x8_t, b: int8x8_t) -> int16x8_t {
+    let b: int16x8_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddw_u16(a: uint32x4_t, b: uint16x4_t) -> uint32x4_t {
+    let b: uint32x4_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddw_u32(a: uint64x2_t, b: uint32x2_t) -> uint64x2_t {
+    let b: uint64x2_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vaddw_u8(a: uint16x8_t, b: uint8x8_t) -> uint16x8_t {
+    let b: uint16x8_t = simd_cast(b);
+    simd_add(a, b)
+}
+
+pub fn vand_s8(a: int8x8_t, b: int8x8_t) -> int8x8_t {
+    simd_and(a, b)
+}
+
+pub fn vandq_s8(a: int8x16_t, b: int8x16_t) -> int8x16_t {
+    simd_and(a, b)
+}
+
+pub fn vand_s16(a: int16x4_t, b: int16x4_t) -> int16x4_t {
+    simd_and(a, b)
+}
+
+pub fn vandq_s16(a: int16x8_t, b: int16x8_t) -> int16x8_t {
+    simd_and(a, b)
+}
+
+pub fn vand_s32(a: int32x2_t, b: int32x2_t) -> int32x2_t {
+    simd_and(a, b)
+}
+
+pub fn vandq_s32(a: int32x4_t, b: int32x4_t) -> int32x4_t {
+    simd_and(a, b)
+}
+
+pub fn vand_s64(a: int64x1_t, b: int64x1_t) -> int64x1_t {
+    simd_and(a, b)
+}
+
+pub fn vandq_s64(a: int64x2_t, b: int64x2_t) -> int64x2_t {
+    simd_and(a, b)
+}
+
+pub fn vand_u8(a: uint8x8_t, b: uint8x8_t) -> uint8x8_t {
+    simd_and(a, b)
+}
+
+pub fn vandq_u8(a: uint8x16_t, b: uint8x16_t) -> uint8x16_t {
+    simd_and(a, b)
+}
+
+pub fn vand_u16(a: uint16x4_t, b: uint16x4_t) -> uint16x4_t {
+    simd_and(a, b)
+}
+
+pub fn vandq_u16(a: uint16x8_t, b: uint16x8_t) -> uint16x8_t {
+    simd_and(a, b)
+}
+
+pub fn vand_u32(a: uint32x2_t, b: uint32x2_t) -> uint32x2_t {
+    simd_and(a, b)
+}
+
+pub fn vandq_u32(a: uint32x4_t, b: uint32x4_t) -> uint32x4_t {
+    simd_and(a, b)
+}
+
+pub fn vand_u64(a: uint64x1_t, b: uint64x1_t) -> uint64x1_t {
+    simd_and(a, b)
+}
+
+pub fn vandq_u64(a: uint64x2_t, b: uint64x2_t) -> uint64x2_t {
+    simd_and(a, b)
+}
+
+pub fn vbic_s16(a: int16x4_t, b: int16x4_t) -> int16x4_t {
+    let c = int16x4_t::splat(-1);
+    simd_and(simd_xor(b, c), a)
+}
+
+pub fn vbic_s32(a: int32x2_t, b: int32x2_t) -> int32x2_t {
+    let c = int32x2_t::splat(-1);
+    simd_and(simd_xor(b, c), a)
+}
+
+pub fn vbic_s64(a: int64x1_t, b: int64x1_t) -> int64x1_t {
+    let c = int64x1_t::splat(-1);
+    simd_and(simd_xor(b, c), a)
+}
+
+pub fn vbic_s8(a: int8x8_t, b: int8x8_t) -> int8x8_t {
+    let c = int8x8_t::splat(-1);
+    simd_and(simd_xor(b, c), a)
+}
+
+pub fn vbicq_s16(a: int16x8_t, b: int16x8_t) -> int16x8_t {
+    let c = int16x8_t::splat(-1);
+    simd_and(simd_xor(b, c), a)
+}
+
+pub fn vbicq_s32(a: int32x4_t, b: int32x4_t) -> int32x4_t {
+    let c = int32x4_t::splat(-1);
+    simd_and(simd_xor(b, c), a)
+}
+
+pub fn vbicq_s64(a: int64x2_t, b: int64x2_t) -> int64x2_t {
+    let c = int64x2_t::splat(-1);
+    simd_and(simd_xor(b, c), a)
+}
+
+pub fn vbicq_s8(a: int8x16_t, b: int8x16_t) -> int8x16_t {
+    let c = int8x16_t::splat(-1);
+    simd_and(simd_xor(b, c), a)
+}
+
+pub fn vbic_u16(a: uint16x4_t, b: uint16x4_t) -> uint16x4_t {
+    let c = int16x4_t::splat(-1);
+    simd_and(simd_xor(b, simd_cast(c)), a)
+}
+
+pub fn vbic_u32(a: uint32x2_t, b: uint32x2_t) -> uint32x2_t {
+    let c = int32x2_t::splat(-1);
+    simd_and(simd_xor(b, simd_cast(c)), a)
+}
+
+pub fn vbic_u64(a: uint64x1_t, b: uint64x1_t) -> uint64x1_t {
+    let c = int64x1_t::splat(-1);
+    simd_and(simd_xor(b, simd_cast(c)), a)
+}
+
+pub fn vbic_u8(a: uint8x8_t, b: uint8x8_t) -> uint8x8_t {
+    let c = int8x8_t::splat(-1);
+    simd_and(simd_xor(b, simd_cast(c)), a)
+}
+
+pub fn vbicq_u16(a: uint16x8_t, b: uint16x8_t) -> uint16x8_t {
+    let c = int16x8_t::splat(-1);
+    simd_and(simd_xor(b, simd_cast(c)), a)
+}
+
+pub fn vbicq_u32(a: uint32x4_t, b: uint32x4_t) -> uint32x4_t {
+    let c = int32x4_t::splat(-1);
+    simd_and(simd_xor(b, simd_cast(c)), a)
+}
+
+pub fn vbicq_u64(a: uint64x2_t, b: uint64x2_t) -> uint64x2_t {
+    let c = int64x2_t::splat(-1);
+    simd_and(simd_xor(b, simd_cast(c)), a)
+}
+
+pub fn vbicq_u8(a: uint8x16_t, b: uint8x16_t) -> uint8x16_t {
+    let c = int8x16_t::splat(-1);
+    simd_and(simd_xor(b, simd_cast(c)), a)
+}
+
+pub fn vbsl_s16(a: uint16x4_t, b: int16x4_t, c: int16x4_t) -> int16x4_t {
+    let not = int16x4_t::splat(-1);
+    simd_cast(simd_or(
+        simd_and(a, simd_cast(b)),
+        simd_and(simd_xor(a, simd_cast(not)), simd_cast(c)),
+    ))
+}
+
+pub fn vbsl_s32(a: uint32x2_t, b: int32x2_t, c: int32x2_t) -> int32x2_t {
+    let not = int32x2_t::splat(-1);
+    simd_cast(simd_or(
+        simd_and(a, simd_cast(b)),
+        simd_and(simd_xor(a, simd_cast(not)), simd_cast(c)),
+    ))
+}
+
+pub fn vbsl_s64(a: uint64x1_t, b: int64x1_t, c: int64x1_t) -> int64x1_t {
+    let not = int64x1_t::splat(-1);
+    simd_cast(simd_or(
+        simd_and(a, simd_cast(b)),
+        simd_and(simd_xor(a, simd_cast(not)), simd_cast(c)),
+    ))
+}
+
+pub fn vbsl_s8(a: uint8x8_t, b: int8x8_t, c: int8x8_t) -> int8x8_t {
+    let not = int8x8_t::splat(-1);
+    simd_cast(simd_or(
+        simd_and(a, simd_cast(b)),
+        simd_and(simd_xor(a, simd_cast(not)), simd_cast(c)),
+    ))
+}
+
+pub fn vbslq_s16(a: uint16x8_t, b: int16x8_t, c: int16x8_t) -> int16x8_t {
+    let not = int16x8_t::splat(-1);
+    simd_cast(simd_or(
+        simd_and(a, simd_cast(b)),
+        simd_and(simd_xor(a, simd_cast(not)), simd_cast(c)),
+    ))
+}
+
+pub fn vbslq_s32(a: uint32x4_t, b: int32x4_t, c: int32x4_t) -> int32x4_t {
+    let not = int32x4_t::splat(-1);
+    simd_cast(simd_or(
+        simd_and(a, simd_cast(b)),
+        simd_and(simd_xor(a, simd_cast(not)), simd_cast(c)),
+    ))
+}
+
+pub fn vbslq_s64(a: uint64x2_t, b: int64x2_t, c: int64x2_t) -> int64x2_t {
+    let not = int64x2_t::splat(-1);
+    simd_cast(simd_or(
+        simd_and(a, simd_cast(b)),
+        simd_and(simd_xor(a, simd_cast(not)), simd_cast(c)),
+    ))
+}
+
+pub fn vbslq_s8(a: uint8x16_t, b: int8x16_t, c: int8x16_t) -> int8x16_t {
+    let not = int8x16_t::splat(-1);
+    simd_cast(simd_or(
+        simd_and(a, simd_cast(b)),
+        simd_and(simd_xor(a, simd_cast(not)), simd_cast(c)),
+    ))
+}
+
+pub fn vbsl_u16(a: uint16x4_t, b: uint16x4_t, c: uint16x4_t) -> uint16x4_t {
+    let not = int16x4_t::splat(-1);
+    simd_or(
+        simd_and(a, simd_cast(b)),
+        simd_and(simd_xor(a, simd_cast(not)), c),
+    )
+}
+
+pub fn vbsl_u32(a: uint32x2_t, b: uint32x2_t, c: uint32x2_t) -> uint32x2_t {
+    let not = int32x2_t::splat(-1);
+    simd_or(
+        simd_and(a, simd_cast(b)),
+        simd_and(simd_xor(a, simd_cast(not)), c),
+    )
+}
+
+pub fn vbsl_u64(a: uint64x1_t, b: uint64x1_t, c: uint64x1_t) -> uint64x1_t {
+    let not = int64x1_t::splat(-1);
+    simd_or(
+        simd_and(a, simd_cast(b)),
+        simd_and(simd_xor(a, simd_cast(not)), c),
+    )
+}
+
+pub fn vbsl_u8(a: uint8x8_t, b: uint8x8_t, c: uint8x8_t) -> uint8x8_t {
+    let not = int8x8_t::splat(-1);
+    simd_or(
+        simd_and(a, simd_cast(b)),
+        simd_and(simd_xor(a, simd_cast(not)), c),
+    )
+}
+
+pub fn vbslq_u16(a: uint16x8_t, b: uint16x8_t, c: uint16x8_t) -> uint16x8_t {
+    let not = int16x8_t::splat(-1);
+    simd_or(
+        simd_and(a, simd_cast(b)),
+        simd_and(simd_xor(a, simd_cast(not)), c),
+    )
+}
+
+pub fn vbslq_u32(a: uint32x4_t, b: uint32x4_t, c: uint32x4_t) -> uint32x4_t {
+    let not = int32x4_t::splat(-1);
+    simd_or(
+        simd_and(a, simd_cast(b)),
+        simd_and(simd_xor(a, simd_cast(not)), c),
+    )
+}
+
+pub fn vbslq_u64(a: uint64x2_t, b: uint64x2_t, c: uint64x2_t) -> uint64x2_t {
+    let not = int64x2_t::splat(-1);
+    simd_or(
+        simd_and(a, simd_cast(b)),
+        simd_and(simd_xor(a, simd_cast(not)), c),
+    )
+}
+
+pub fn vbslq_u8(a: uint8x16_t, b: uint8x16_t, c: uint8x16_t) -> uint8x16_t {
+    let not = int8x16_t::splat(-1);
+    simd_or(
+        simd_and(a, simd_cast(b)),
+        simd_and(simd_xor(a, simd_cast(not)), c),
+    )
+}
+
+pub fn vceq_s8(a: int8x8_t, b: int8x8_t) -> uint8x8_t {
+    simd_cast(simd_eq(a, b))
+}
+
+pub fn vceqq_s8(a: int8x16_t, b: int8x16_t) -> uint8x16_t {
+    simd_cast(simd_eq(a, b))
+}
+
+pub fn vceq_s16(a: int16x4_t, b: int16x4_t) -> uint16x4_t {
+    simd_cast(simd_eq(a, b))
+}
+
+pub fn vceqq_s16(a: int16x8_t, b: int16x8_t) -> uint16x8_t {
+    simd_cast(simd_eq(a, b))
+}
+
+pub fn vceq_s32(a: int32x2_t, b: int32x2_t) -> uint32x2_t {
+    simd_cast(simd_eq(a, b))
+}
+
+pub fn vceqq_s32(a: int32x4_t, b: int32x4_t) -> uint32x4_t {
+    simd_cast(simd_eq(a, b))
+}
+
+pub fn vceq_u8(a: uint8x8_t, b: uint8x8_t) -> uint8x8_t {
+    simd_eq(a, b)
+}
+
+pub fn vceqq_u8(a: uint8x16_t, b: uint8x16_t) -> uint8x16_t {
+    simd_eq(a, b)
+}
+
+pub fn vceq_u16(a: uint16x4_t, b: uint16x4_t) -> uint16x4_t {
+    simd_eq(a, b)
+}
+
+pub fn vceqq_u16(a: uint16x8_t, b: uint16x8_t) -> uint16x8_t {
+    simd_eq(a, b)
+}
+
+pub fn vceq_u32(a: uint32x2_t, b: uint32x2_t) -> uint32x2_t {
+    simd_eq(a, b)
+}
+
+pub fn vceqq_u32(a: uint32x4_t, b: uint32x4_t) -> uint32x4_t {
+    simd_eq(a, b)
+}
+
+pub fn vcge_s8(a: int8x8_t, b: int8x8_t) -> uint8x8_t {
+    simd_cast(simd_ge(a, b))
+}
+
+pub fn vcgeq_s8(a: int8x16_t, b: int8x16_t) -> uint8x16_t {
+    simd_cast(simd_ge(a, b))
+}
+
+pub fn vcge_s16(a: int16x4_t, b: int16x4_t) -> uint16x4_t {
+    simd_cast(simd_ge(a, b))
+}
+
+pub fn vcgeq_s16(a: int16x8_t, b: int16x8_t) -> uint16x8_t {
+    simd_cast(simd_ge(a, b))
+}
+
+pub fn vcge_s32(a: int32x2_t, b: int32x2_t) -> uint32x2_t {
+    simd_cast(simd_ge(a, b))
+}
+
+pub fn vcgeq_s32(a: int32x4_t, b: int32x4_t) -> uint32x4_t {
+    simd_cast(simd_ge(a, b))
+}
+
+pub fn vcge_u8(a: uint8x8_t, b: uint8x8_t) -> uint8x8_t {
+    simd_ge(a, b)
+}
+
+pub fn vcgeq_u8(a: uint8x16_t, b: uint8x16_t) -> uint8x16_t {
+    simd_ge(a, b)
+}
+
+pub fn vcge_u16(a: uint16x4_t, b: uint16x4_t) -> uint16x4_t {
+    simd_ge(a, b)
+}
+
+pub fn vcgeq_u16(a: uint16x8_t, b: uint16x8_t) -> uint16x8_t {
+    simd_ge(a, b)
+}
+
+pub fn vcge_u32(a: uint32x2_t, b: uint32x2_t) -> uint32x2_t {
+    simd_ge(a, b)
+}
+
+pub fn vcgeq_u32(a: uint32x4_t, b: uint32x4_t) -> uint32x4_t {
+    simd_ge(a, b)
+}
+
+pub fn vcgt_s8(a: int8x8_t, b: int8x8_t) -> uint8x8_t {
+    simd_cast(simd_gt(a, b))
+}
+
+pub fn vcgtq_s8(a: int8x16_t, b: int8x16_t) -> uint8x16_t {
+    simd_cast(simd_gt(a, b))
+}
+
+pub fn vcgt_s16(a: int16x4_t, b: int16x4_t) -> uint16x4_t {
+    simd_cast(simd_gt(a, b))
+}
+
+pub fn vcgtq_s16(a: int16x8_t, b: int16x8_t) -> uint16x8_t {
+    simd_cast(simd_gt(a, b))
+}
+
+pub fn vcgt_s32(a: int32x2_t, b: int32x2_t) -> uint32x2_t {
+    simd_cast(simd_gt(a, b))
+}
+
+pub fn vcgtq_s32(a: int32x4_t, b: int32x4_t) -> uint32x4_t {
+    simd_cast(simd_gt(a, b))
+}
+
+pub fn vcgt_u8(a: uint8x8_t, b: uint8x8_t) -> uint8x8_t {
+    simd_gt(a, b)
+}
+
+pub fn vcgtq_u8(a: uint8x16_t, b: uint8x16_t) -> uint8x16_t {
+    simd_gt(a, b)
+}
+
+pub fn vcgt_u16(a: uint16x4_t, b: uint16x4_t) -> uint16x4_t {
+    simd_gt(a, b)
+}
+
+pub fn vcgtq_u16(a: uint16x8_t, b: uint16x8_t) -> uint16x8_t {
+    simd_gt(a, b)
+}
+
+pub fn vcgt_u32(a: uint32x2_t, b: uint32x2_t) -> uint32x2_t {
+    simd_gt(a, b)
+}
+
+pub fn vcgtq_u32(a: uint32x4_t, b: uint32x4_t) -> uint32x4_t {
+    simd_gt(a, b)
+}
+
+pub fn vcle_s8(a: int8x8_t, b: int8x8_t) -> uint8x8_t {
+    simd_cast(simd_le(a, b))
+}
+
+pub fn vcleq_s8(a: int8x16_t, b: int8x16_t) -> uint8x16_t {
+    simd_cast(simd_le(a, b))
+}
+
+pub fn vcle_s16(a: int16x4_t, b: int16x4_t) -> uint16x4_t {
+    simd_cast(simd_le(a, b))
+}
+
+pub fn vcleq_s16(a: int16x8_t, b: int16x8_t) -> uint16x8_t {
+    simd_cast(simd_le(a, b))
+}
+
+pub fn vcle_s32(a: int32x2_t, b: int32x2_t) -> uint32x2_t {
+    simd_cast(simd_le(a, b))
+}
+
+pub fn vcleq_s32(a: int32x4_t, b: int32x4_t) -> uint32x4_t {
+    simd_cast(simd_le(a, b))
+}
+
+pub fn vcle_u8(a: uint8x8_t, b: uint8x8_t) -> uint8x8_t {
+    simd_le(a, b)
+}
+
+pub fn vcleq_u8(a: uint8x16_t, b: uint8x16_t) -> uint8x16_t {
+    simd_le(a, b)
+}
+
+pub fn vcle_u16(a: uint16x4_t, b: uint16x4_t) -> uint16x4_t {
+    simd_le(a, b)
+}
+
+pub fn vcleq_u16(a: uint16x8_t, b: uint16x8_t) -> uint16x8_t {
+    simd_le(a, b)
+}
+
+pub fn vcle_u32(a: uint32x2_t, b: uint32x2_t) -> uint32x2_t {
+    simd_le(a, b)
+}
+
+pub fn vcleq_u32(a: uint32x4_t, b: uint32x4_t) -> uint32x4_t {
+    simd_le(a, b)
+}
diff --git a/testable-simd-models/src/core_arch/arm_shared/tests/mod.rs b/testable-simd-models/src/core_arch/arm_shared/tests/mod.rs
new file mode 100644
index 0000000000000..7ec0df1263b7f
--- /dev/null
+++ b/testable-simd-models/src/core_arch/arm_shared/tests/mod.rs
@@ -0,0 +1,112 @@
+//! Tests for intrinsics defined in `crate::core_arch::models::arm_shared`
+//!
+//! Each and every modelled intrinsic is tested against the Rust
+//! implementation here. For the most part, the tests work by
+//! generating random inputs, passing them as arguments
+//! to both the models in this crate, and the corresponding intrinsics
+//! in the Rust core and then comparing their outputs.
+//!
+//! To add a test for a modelled intrinsic, go the appropriate file, and
+//! use the `mk!` macro to define it.
+//!
+//! A `mk!` macro invocation looks like the following,
+//! `mk!([<number of times the random test happens>]<function name>{<<const values, if the function takes any>,>}(<function arguments : with types,>))
+//!
+//! For example, some valid invocations are
+//!
+//! `mk!([100]_mm256_extracti128_si256{<0>,<1>}(a: BitVec));`
+//! `mk!(_mm256_extracti128_si256{<0>,<1>}(a: BitVec));`
+//! `mk!(_mm256_abs_epi16(a: BitVec));`
+//!
+//! The number of random tests is optional. If not provided, it is taken to be 1000 by default.
+//! The const values are necessary if the function has constant arguments, but should be discarded if not.
+//! The function name and the function arguments are necessary in all cases.
+//!
+//! Note: This only works if the function returns a bit-vector or funarray. If it returns an integer, the
+//! test has to be written manually. It is recommended that the manually defined test follows
+//! the pattern of tests defined via the `mk!` invocation. It is also recommended that, in the
+//! case that the intrinsic takes constant arguments, each and every possible constant value
+//! (upto a maximum of 255) that can be passed to the function be used for testing. The number
+//! of constant values passed depends on if the Rust intrinsics statically asserts that the
+//! length of the constant argument be less than or equal to a certain number of bits.
+
+pub mod neon;
+
+#[allow(non_camel_case_types)]
+mod types {
+    use crate::abstractions::simd::*;
+    pub type int32x4_t = i32x4;
+    pub type int64x1_t = i64x1;
+    pub type int64x2_t = i64x2;
+    pub type int16x8_t = i16x8;
+    pub type int8x16_t = i8x16;
+    pub type uint32x4_t = u32x4;
+    pub type uint64x1_t = u64x1;
+    pub type uint64x2_t = u64x2;
+    pub type uint16x8_t = u16x8;
+    pub type uint8x16_t = u8x16;
+    pub type int32x2_t = i32x2;
+    pub type int16x4_t = i16x4;
+    pub type int8x8_t = i8x8;
+    pub type uint32x2_t = u32x2;
+    pub type uint16x4_t = u16x4;
+    pub type uint8x8_t = u8x8;
+}
+
+pub(crate) mod upstream {
+    #[cfg(target_arch = "aarch64")]
+    pub use core::arch::aarch64::*;
+    #[cfg(target_arch = "arm")]
+    pub use core::arch::arm::*;
+}
+
+#[cfg(any(target_arch = "arm", target_arch = "aarch64"))]
+pub mod conversions {
+    use super::upstream::*;
+
+    use super::types;
+    use crate::abstractions::bitvec::BitVec;
+    use crate::abstractions::funarr::FunArray;
+
+    macro_rules! convert{
+	($($ty1:ident [$ty2:ty ; $n:literal]),*) => {
+	    $(
+		impl From<$ty1> for types::$ty1 {
+		    fn from (arg: $ty1) -> types::$ty1 {
+			let stuff = unsafe { *(&arg as *const $ty1 as *const [$ty2; $n])};
+			FunArray::from_fn(|i|
+					  stuff[i as usize]
+			)
+		    }
+		}
+		impl From<types::$ty1> for $ty1 {
+		    fn from (arg: types::$ty1) -> $ty1 {
+			let bv: &[u8] = &(BitVec::from(arg)).to_vec()[..];
+			unsafe {
+			    *(bv.as_ptr() as *const [$ty2; $n] as *const _)
+			}
+		    }
+		}
+	    )*
+	}
+    }
+
+    convert!(
+    int32x4_t [i32; 4],
+    int64x1_t [i64; 1],
+    int64x2_t [i64; 2],
+    int16x8_t [i16; 8],
+    int8x16_t [i8; 16],
+    uint32x4_t [u32; 4],
+    uint64x1_t [u64; 1],
+    uint64x2_t [u64; 2],
+    uint16x8_t [u16; 8],
+    uint8x16_t [u8; 16],
+    int32x2_t [i32; 2],
+    int16x4_t [i16; 4],
+    int8x8_t [i8; 8],
+    uint32x2_t [u32; 2],
+    uint16x4_t [u16; 4],
+    uint8x8_t [u8; 8]
+    );
+}
diff --git a/testable-simd-models/src/core_arch/arm_shared/tests/neon.rs b/testable-simd-models/src/core_arch/arm_shared/tests/neon.rs
new file mode 100644
index 0000000000000..e07d385f656f6
--- /dev/null
+++ b/testable-simd-models/src/core_arch/arm_shared/tests/neon.rs
@@ -0,0 +1,218 @@
+#[cfg(test)]
+use super::upstream;
+use crate::abstractions::funarr::FunArray;
+use crate::helpers::test::HasRandom;
+/// Derives tests for a given intrinsics. Test that a given intrinsics and its model compute the same thing over random values (1000 by default).
+macro_rules! mk {
+    ($([$N:literal])?$name:ident$({$(<$($c:literal),*>),*})?($($x:ident : $ty:ident),*)) => {
+        #[test]
+        fn $name() {
+            #[allow(unused)]
+            const N: usize = {
+                let n: usize = 1000;
+                $(let n: usize = $N;)?
+                    n
+            };
+            mk!(@[N]$name$($(<$($c),*>)*)?($($x : $ty),*));
+        }
+    };
+    (@[$N:ident]$name:ident$(<$($c:literal),*>)?($($x:ident : $ty:ident),*)) => {
+        for _ in 0..$N {
+            $(let $x = $ty::random();)*
+                assert_eq!(super::super::models::neon::$name$(::<$($c,)*>)?($($x.into(),)*), unsafe {
+                    FunArray::from(upstream::$name$(::<$($c,)*>)?($($x.into(),)*)).into()
+                });
+        }
+    };
+    (@[$N:ident]$name:ident<$($c1:literal),*>$(<$($c:literal),*>)*($($x:ident : $ty:ident),*)) => {
+        let one = || {
+            mk!(@[$N]$name<$($c1),*>($($x : $ty),*));
+        };
+        one();
+        mk!(@[$N]$name$(<$($c),*>)*($($x : $ty),*));
+    }
+
+}
+
+use super::types::*;
+mk!(vaba_s16(a: int16x4_t, b: int16x4_t, c: int16x4_t));
+mk!(vaba_s32(a: int32x2_t, b: int32x2_t, c: int32x2_t));
+mk!(vaba_s8(a: int8x8_t, b: int8x8_t, c: int8x8_t));
+mk!(vaba_u16(a: uint16x4_t, b: uint16x4_t, c: uint16x4_t));
+mk!(vaba_u32(a: uint32x2_t, b: uint32x2_t, c: uint32x2_t));
+mk!(vaba_u8(a: uint8x8_t, b: uint8x8_t, c: uint8x8_t));
+mk!(vabal_u8(a: uint16x8_t, b: uint8x8_t, c: uint8x8_t));
+mk!(vabal_u16(a: uint32x4_t, b: uint16x4_t, c: uint16x4_t));
+mk!(vabal_u32(a: uint64x2_t, b: uint32x2_t, c: uint32x2_t));
+mk!(vabaq_s16(a: int16x8_t, b: int16x8_t, c: int16x8_t));
+mk!(vabaq_s32(a: int32x4_t, b: int32x4_t, c: int32x4_t));
+mk!(vabaq_s8(a: int8x16_t, b: int8x16_t, c: int8x16_t));
+mk!(vabaq_u16(a: uint16x8_t, b: uint16x8_t, c: uint16x8_t));
+mk!(vabaq_u32(a: uint32x4_t, b: uint32x4_t, c: uint32x4_t));
+mk!(vabaq_u8(a: uint8x16_t, b: uint8x16_t, c: uint8x16_t));
+mk!(vabd_s8(a: int8x8_t, b: int8x8_t));
+mk!(vabdq_s8(a: int8x16_t, b: int8x16_t));
+mk!(vabd_s16(a: int16x4_t, b: int16x4_t));
+mk!(vabdq_s16(a: int16x8_t, b: int16x8_t));
+mk!(vabd_s32(a: int32x2_t, b: int32x2_t));
+mk!(vabdq_s32(a: int32x4_t, b: int32x4_t));
+mk!(vabd_u8(a: uint8x8_t, b: uint8x8_t));
+mk!(vabdq_u8(a: uint8x16_t, b: uint8x16_t));
+mk!(vabd_u16(a: uint16x4_t, b: uint16x4_t));
+mk!(vabdq_u16(a: uint16x8_t, b: uint16x8_t));
+mk!(vabd_u32(a: uint32x2_t, b: uint32x2_t));
+mk!(vabdq_u32(a: uint32x4_t, b: uint32x4_t));
+mk!(vabdl_u8(a: uint8x8_t, b: uint8x8_t));
+mk!(vabdl_u16(a: uint16x4_t, b: uint16x4_t));
+mk!(vabdl_u32(a: uint32x2_t, b: uint32x2_t));
+mk!(vabs_s8(a: int8x8_t));
+mk!(vabsq_s8(a: int8x16_t));
+mk!(vabs_s16(a: int16x4_t));
+mk!(vabsq_s16(a: int16x8_t));
+mk!(vabs_s32(a: int32x2_t));
+mk!(vabsq_s32(a: int32x4_t));
+mk!(vadd_s16(a: int16x4_t, b: int16x4_t));
+mk!(vadd_s32(a: int32x2_t, b: int32x2_t));
+mk!(vadd_s8(a: int8x8_t, b: int8x8_t));
+mk!(vadd_u16(a: uint16x4_t, b: uint16x4_t));
+mk!(vadd_u32(a: uint32x2_t, b: uint32x2_t));
+mk!(vadd_u8(a: uint8x8_t, b: uint8x8_t));
+mk!(vaddq_s16(a: int16x8_t, b: int16x8_t));
+mk!(vaddq_s32(a: int32x4_t, b: int32x4_t));
+mk!(vaddq_s64(a: int64x2_t, b: int64x2_t));
+mk!(vaddq_s8(a: int8x16_t, b: int8x16_t));
+mk!(vaddq_u16(a: uint16x8_t, b: uint16x8_t));
+mk!(vaddq_u32(a: uint32x4_t, b: uint32x4_t));
+mk!(vaddq_u64(a: uint64x2_t, b: uint64x2_t));
+mk!(vaddq_u8(a: uint8x16_t, b: uint8x16_t));
+mk!(vaddhn_high_s16(r: int8x8_t, a: int16x8_t, b: int16x8_t));
+mk!(vaddhn_high_s32(r: int16x4_t, a: int32x4_t, b: int32x4_t));
+mk!(vaddhn_high_s64(r: int32x2_t, a: int64x2_t, b: int64x2_t));
+mk!(vaddhn_high_u16(r: uint8x8_t, a: uint16x8_t, b: uint16x8_t));
+mk!(vaddhn_high_u32(r: uint16x4_t, a: uint32x4_t, b: uint32x4_t));
+mk!(vaddhn_high_u64(r: uint32x2_t, a: uint64x2_t, b: uint64x2_t));
+mk!(vaddhn_s16(a: int16x8_t, b: int16x8_t));
+mk!(vaddhn_s32(a: int32x4_t, b: int32x4_t));
+mk!(vaddhn_s64(a: int64x2_t, b: int64x2_t));
+mk!(vaddhn_u16(a: uint16x8_t, b: uint16x8_t));
+mk!(vaddhn_u32(a: uint32x4_t, b: uint32x4_t));
+mk!(vaddhn_u64(a: uint64x2_t, b: uint64x2_t));
+mk!(vaddl_high_s16(a: int16x8_t, b: int16x8_t));
+mk!(vaddl_high_s32(a: int32x4_t, b: int32x4_t));
+mk!(vaddl_high_s8(a: int8x16_t, b: int8x16_t));
+mk!(vaddl_high_u16(a: uint16x8_t, b: uint16x8_t));
+mk!(vaddl_high_u32(a: uint32x4_t, b: uint32x4_t));
+mk!(vaddl_high_u8(a: uint8x16_t, b: uint8x16_t));
+mk!(vaddl_s16(a: int16x4_t, b: int16x4_t));
+mk!(vaddl_s32(a: int32x2_t, b: int32x2_t));
+mk!(vaddl_s8(a: int8x8_t, b: int8x8_t));
+mk!(vaddl_u16(a: uint16x4_t, b: uint16x4_t));
+mk!(vaddl_u32(a: uint32x2_t, b: uint32x2_t));
+mk!(vaddl_u8(a: uint8x8_t, b: uint8x8_t));
+mk!(vaddw_high_s16(a: int32x4_t, b: int16x8_t));
+mk!(vaddw_high_s32(a: int64x2_t, b: int32x4_t));
+mk!(vaddw_high_s8(a: int16x8_t, b: int8x16_t));
+mk!(vaddw_high_u16(a: uint32x4_t, b: uint16x8_t));
+mk!(vaddw_high_u32(a: uint64x2_t, b: uint32x4_t));
+mk!(vaddw_high_u8(a: uint16x8_t, b: uint8x16_t));
+mk!(vaddw_s16(a: int32x4_t, b: int16x4_t));
+mk!(vaddw_s32(a: int64x2_t, b: int32x2_t));
+mk!(vaddw_s8(a: int16x8_t, b: int8x8_t));
+mk!(vaddw_u16(a: uint32x4_t, b: uint16x4_t));
+mk!(vaddw_u32(a: uint64x2_t, b: uint32x2_t));
+mk!(vaddw_u8(a: uint16x8_t, b: uint8x8_t));
+mk!(vand_s8(a: int8x8_t, b: int8x8_t));
+mk!(vandq_s8(a: int8x16_t, b: int8x16_t));
+mk!(vand_s16(a: int16x4_t, b: int16x4_t));
+mk!(vandq_s16(a: int16x8_t, b: int16x8_t));
+mk!(vand_s32(a: int32x2_t, b: int32x2_t));
+mk!(vandq_s32(a: int32x4_t, b: int32x4_t));
+mk!(vand_s64(a: int64x1_t, b: int64x1_t));
+mk!(vandq_s64(a: int64x2_t, b: int64x2_t));
+mk!(vand_u8(a: uint8x8_t, b: uint8x8_t));
+mk!(vandq_u8(a: uint8x16_t, b: uint8x16_t));
+mk!(vand_u16(a: uint16x4_t, b: uint16x4_t));
+mk!(vandq_u16(a: uint16x8_t, b: uint16x8_t));
+mk!(vand_u32(a: uint32x2_t, b: uint32x2_t));
+mk!(vandq_u32(a: uint32x4_t, b: uint32x4_t));
+mk!(vand_u64(a: uint64x1_t, b: uint64x1_t));
+mk!(vandq_u64(a: uint64x2_t, b: uint64x2_t));
+mk!(vbic_s16(a: int16x4_t, b: int16x4_t));
+mk!(vbic_s32(a: int32x2_t, b: int32x2_t));
+mk!(vbic_s8(a: int8x8_t, b: int8x8_t));
+mk!(vbicq_s16(a: int16x8_t, b: int16x8_t));
+mk!(vbicq_s32(a: int32x4_t, b: int32x4_t));
+mk!(vbicq_s64(a: int64x2_t, b: int64x2_t));
+mk!(vbicq_s8(a: int8x16_t, b: int8x16_t));
+mk!(vbic_u16(a: uint16x4_t, b: uint16x4_t));
+mk!(vbic_u32(a: uint32x2_t, b: uint32x2_t));
+mk!(vbic_u64(a: uint64x1_t, b: uint64x1_t));
+mk!(vbic_u8(a: uint8x8_t, b: uint8x8_t));
+mk!(vbicq_u16(a: uint16x8_t, b: uint16x8_t));
+mk!(vbicq_u32(a: uint32x4_t, b: uint32x4_t));
+mk!(vbicq_u64(a: uint64x2_t, b: uint64x2_t));
+mk!(vbicq_u8(a: uint8x16_t, b: uint8x16_t));
+mk!(vbsl_s16(a: uint16x4_t, b: int16x4_t, c: int16x4_t));
+mk!(vbsl_s32(a: uint32x2_t, b: int32x2_t, c: int32x2_t));
+mk!(vbsl_s64(a: uint64x1_t, b: int64x1_t, c: int64x1_t));
+mk!(vbsl_s8(a: uint8x8_t, b: int8x8_t, c: int8x8_t));
+mk!(vbslq_s16(a: uint16x8_t, b: int16x8_t, c: int16x8_t));
+mk!(vbslq_s32(a: uint32x4_t, b: int32x4_t, c: int32x4_t));
+mk!(vbslq_s64(a: uint64x2_t, b: int64x2_t, c: int64x2_t));
+mk!(vbslq_s8(a: uint8x16_t, b: int8x16_t, c: int8x16_t));
+mk!(vbsl_u16(a: uint16x4_t, b: uint16x4_t, c: uint16x4_t));
+mk!(vbsl_u32(a: uint32x2_t, b: uint32x2_t, c: uint32x2_t));
+mk!(vbsl_u64(a: uint64x1_t, b: uint64x1_t, c: uint64x1_t));
+mk!(vbsl_u8(a: uint8x8_t, b: uint8x8_t, c: uint8x8_t));
+mk!(vbslq_u16(a: uint16x8_t, b: uint16x8_t, c: uint16x8_t));
+mk!(vbslq_u32(a: uint32x4_t, b: uint32x4_t, c: uint32x4_t));
+mk!(vbslq_u64(a: uint64x2_t, b: uint64x2_t, c: uint64x2_t));
+mk!(vbslq_u8(a: uint8x16_t, b: uint8x16_t, c: uint8x16_t));
+mk!(vceq_s8(a: int8x8_t, b: int8x8_t));
+mk!(vceqq_s8(a: int8x16_t, b: int8x16_t));
+mk!(vceq_s16(a: int16x4_t, b: int16x4_t));
+mk!(vceqq_s16(a: int16x8_t, b: int16x8_t));
+mk!(vceq_s32(a: int32x2_t, b: int32x2_t));
+mk!(vceqq_s32(a: int32x4_t, b: int32x4_t));
+mk!(vceq_u8(a: uint8x8_t, b: uint8x8_t));
+mk!(vceqq_u8(a: uint8x16_t, b: uint8x16_t));
+mk!(vceq_u16(a: uint16x4_t, b: uint16x4_t));
+mk!(vceqq_u16(a: uint16x8_t, b: uint16x8_t));
+mk!(vceq_u32(a: uint32x2_t, b: uint32x2_t));
+mk!(vceqq_u32(a: uint32x4_t, b: uint32x4_t));
+mk!(vcge_s8(a: int8x8_t, b: int8x8_t));
+mk!(vcgeq_s8(a: int8x16_t, b: int8x16_t));
+mk!(vcge_s16(a: int16x4_t, b: int16x4_t));
+mk!(vcgeq_s16(a: int16x8_t, b: int16x8_t));
+mk!(vcge_s32(a: int32x2_t, b: int32x2_t));
+mk!(vcgeq_s32(a: int32x4_t, b: int32x4_t));
+mk!(vcge_u8(a: uint8x8_t, b: uint8x8_t));
+mk!(vcgeq_u8(a: uint8x16_t, b: uint8x16_t));
+mk!(vcge_u16(a: uint16x4_t, b: uint16x4_t));
+mk!(vcgeq_u16(a: uint16x8_t, b: uint16x8_t));
+mk!(vcge_u32(a: uint32x2_t, b: uint32x2_t));
+mk!(vcgeq_u32(a: uint32x4_t, b: uint32x4_t));
+mk!(vcgt_s8(a: int8x8_t, b: int8x8_t));
+mk!(vcgtq_s8(a: int8x16_t, b: int8x16_t));
+mk!(vcgt_s16(a: int16x4_t, b: int16x4_t));
+mk!(vcgtq_s16(a: int16x8_t, b: int16x8_t));
+mk!(vcgt_s32(a: int32x2_t, b: int32x2_t));
+mk!(vcgtq_s32(a: int32x4_t, b: int32x4_t));
+mk!(vcgt_u8(a: uint8x8_t, b: uint8x8_t));
+mk!(vcgtq_u8(a: uint8x16_t, b: uint8x16_t));
+mk!(vcgt_u16(a: uint16x4_t, b: uint16x4_t));
+mk!(vcgtq_u16(a: uint16x8_t, b: uint16x8_t));
+mk!(vcgt_u32(a: uint32x2_t, b: uint32x2_t));
+mk!(vcgtq_u32(a: uint32x4_t, b: uint32x4_t));
+mk!(vcle_s8(a: int8x8_t, b: int8x8_t));
+mk!(vcleq_s8(a: int8x16_t, b: int8x16_t));
+mk!(vcle_s16(a: int16x4_t, b: int16x4_t));
+mk!(vcleq_s16(a: int16x8_t, b: int16x8_t));
+mk!(vcle_s32(a: int32x2_t, b: int32x2_t));
+mk!(vcleq_s32(a: int32x4_t, b: int32x4_t));
+mk!(vcle_u8(a: uint8x8_t, b: uint8x8_t));
+mk!(vcleq_u8(a: uint8x16_t, b: uint8x16_t));
+mk!(vcle_u16(a: uint16x4_t, b: uint16x4_t));
+mk!(vcleq_u16(a: uint16x8_t, b: uint16x8_t));
+mk!(vcle_u32(a: uint32x2_t, b: uint32x2_t));
+mk!(vcleq_u32(a: uint32x4_t, b: uint32x4_t));
diff --git a/testable-simd-models/src/core_arch/x86/mod.rs b/testable-simd-models/src/core_arch/x86/mod.rs
new file mode 100644
index 0000000000000..3c5cd51d9c56b
--- /dev/null
+++ b/testable-simd-models/src/core_arch/x86/mod.rs
@@ -0,0 +1,4 @@
+pub mod models;
+#[cfg(test)]
+#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
+mod tests;
diff --git a/testable-simd-models/src/core_arch/x86/models/avx.rs b/testable-simd-models/src/core_arch/x86/models/avx.rs
new file mode 100644
index 0000000000000..f392a7abf05b0
--- /dev/null
+++ b/testable-simd-models/src/core_arch/x86/models/avx.rs
@@ -0,0 +1,432 @@
+//! Advanced Vector Extensions (AVX)
+//!
+//! The references are:
+//!
+//! - [Intel 64 and IA-32 Architectures Software Developer's Manual Volume 2:
+//!   Instruction Set Reference, A-Z][intel64_ref]. - [AMD64 Architecture
+//!   Programmer's Manual, Volume 3: General-Purpose and System
+//!   Instructions][amd64_ref].
+//!
+//! [Wikipedia][wiki] provides a quick overview of the instructions available.
+//!
+//! [intel64_ref]: http://www.intel.de/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf
+//! [amd64_ref]: http://support.amd.com/TechDocs/24594.pdf
+//! [wiki]: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
+
+use super::types::*;
+use crate::abstractions::{bit::Bit, bitvec::BitVec, simd::*};
+
+mod c_extern {
+    use crate::abstractions::simd::*;
+
+    pub fn vperm2f128si256(a: i32x8, b: i32x8, imm8: i8) -> i32x8 {
+        let temp = i128x2::from_fn(|i| match (imm8 as u8) >> (i * 4) {
+            0 => (a[4 * i] as i128) + 16 * (a[4 * i + 1] as i128),
+            1 => (a[4 * i + 2] as i128) + 16 * (a[4 * i + 3] as i128),
+            2 => (b[4 * i] as i128) + 16 * (b[4 * i + 1] as i128),
+            3 => (b[4 * i + 2] as i128) + 16 * (b[4 * i + 3] as i128),
+            _ => unreachable!(),
+        });
+
+        i32x8::from_fn(|i| (temp[if i < 4 { 0 } else { 1 }] >> (i % 4)) as i32)
+    }
+}
+
+use c_extern::*;
+/// Blends packed single-precision (32-bit) floating-point elements from
+/// `a` and `b` using `c` as a mask.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_blendv_ps)
+pub fn _mm256_blendv_ps(a: __m256, b: __m256, c: __m256) -> __m256 {
+    let mask: i32x8 = simd_lt(BitVec::to_i32x8(c), i32x8::from_fn(|_| 0));
+    BitVec::from_i32x8(simd_select(mask, BitVec::to_i32x8(b), BitVec::to_i32x8(a)))
+}
+
+/// Equal (ordered, non-signaling)
+
+pub const _CMP_EQ_OQ: i32 = 0x00;
+/// Less-than (ordered, signaling)
+
+pub const _CMP_LT_OS: i32 = 0x01;
+/// Less-than-or-equal (ordered, signaling)
+
+pub const _CMP_LE_OS: i32 = 0x02;
+/// Unordered (non-signaling)
+
+pub const _CMP_UNORD_Q: i32 = 0x03;
+/// Not-equal (unordered, non-signaling)
+
+pub const _CMP_NEQ_UQ: i32 = 0x04;
+/// Not-less-than (unordered, signaling)
+
+pub const _CMP_NLT_US: i32 = 0x05;
+/// Not-less-than-or-equal (unordered, signaling)
+
+pub const _CMP_NLE_US: i32 = 0x06;
+/// Ordered (non-signaling)
+
+pub const _CMP_ORD_Q: i32 = 0x07;
+/// Equal (unordered, non-signaling)
+
+pub const _CMP_EQ_UQ: i32 = 0x08;
+/// Not-greater-than-or-equal (unordered, signaling)
+
+pub const _CMP_NGE_US: i32 = 0x09;
+/// Not-greater-than (unordered, signaling)
+
+pub const _CMP_NGT_US: i32 = 0x0a;
+/// False (ordered, non-signaling)
+
+pub const _CMP_FALSE_OQ: i32 = 0x0b;
+/// Not-equal (ordered, non-signaling)
+
+pub const _CMP_NEQ_OQ: i32 = 0x0c;
+/// Greater-than-or-equal (ordered, signaling)
+
+pub const _CMP_GE_OS: i32 = 0x0d;
+/// Greater-than (ordered, signaling)
+
+pub const _CMP_GT_OS: i32 = 0x0e;
+/// True (unordered, non-signaling)
+
+pub const _CMP_TRUE_UQ: i32 = 0x0f;
+/// Equal (ordered, signaling)
+
+pub const _CMP_EQ_OS: i32 = 0x10;
+/// Less-than (ordered, non-signaling)
+
+pub const _CMP_LT_OQ: i32 = 0x11;
+/// Less-than-or-equal (ordered, non-signaling)
+
+pub const _CMP_LE_OQ: i32 = 0x12;
+/// Unordered (signaling)
+
+pub const _CMP_UNORD_S: i32 = 0x13;
+/// Not-equal (unordered, signaling)
+
+pub const _CMP_NEQ_US: i32 = 0x14;
+/// Not-less-than (unordered, non-signaling)
+
+pub const _CMP_NLT_UQ: i32 = 0x15;
+/// Not-less-than-or-equal (unordered, non-signaling)
+
+pub const _CMP_NLE_UQ: i32 = 0x16;
+/// Ordered (signaling)
+
+pub const _CMP_ORD_S: i32 = 0x17;
+/// Equal (unordered, signaling)
+
+pub const _CMP_EQ_US: i32 = 0x18;
+/// Not-greater-than-or-equal (unordered, non-signaling)
+
+pub const _CMP_NGE_UQ: i32 = 0x19;
+/// Not-greater-than (unordered, non-signaling)
+
+pub const _CMP_NGT_UQ: i32 = 0x1a;
+/// False (ordered, signaling)
+
+pub const _CMP_FALSE_OS: i32 = 0x1b;
+/// Not-equal (ordered, signaling)
+
+pub const _CMP_NEQ_OS: i32 = 0x1c;
+/// Greater-than-or-equal (ordered, non-signaling)
+
+pub const _CMP_GE_OQ: i32 = 0x1d;
+/// Greater-than (ordered, non-signaling)
+
+pub const _CMP_GT_OQ: i32 = 0x1e;
+/// True (unordered, signaling)
+
+pub const _CMP_TRUE_US: i32 = 0x1f;
+
+pub fn _mm256_permute2f128_si256<const IMM8: i32>(a: __m256i, b: __m256i) -> __m256i {
+    // // static_assert_uimm_bits!(IMM8, 8);
+    vperm2f128si256(BitVec::to_i32x8(a), BitVec::to_i32x8(b), IMM8 as i8).into()
+}
+
+/// Copies `a` to result, then inserts 128 bits from `b` into result
+/// at the location specified by `imm8`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_insertf128_si256)
+
+pub fn _mm256_insertf128_si256<const IMM1: i32>(a: __m256i, b: __m128i) -> __m256i {
+    // // static_assert_uimm_bits!(IMM1, 1);
+
+    let dst: i64x4 = simd_shuffle(
+        BitVec::to_i64x4(a),
+        BitVec::to_i64x4(_mm256_castsi128_si256(b)),
+        [[4, 5, 2, 3], [0, 1, 4, 5]][IMM1 as usize],
+    );
+    dst.into()
+}
+
+/// Copies `a` to result, and inserts the 8-bit integer `i` into result
+/// at the location specified by `index`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_insert_epi8)
+
+// This intrinsic has no corresponding instruction.
+
+pub fn _mm256_insert_epi8<const INDEX: i32>(a: __m256i, i: i8) -> __m256i {
+    // // static_assert_uimm_bits!(INDEX, 5);
+    simd_insert(BitVec::to_i8x32(a), INDEX as u64, i).into()
+}
+
+/// Copies `a` to result, and inserts the 16-bit integer `i` into result
+/// at the location specified by `index`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_insert_epi16)
+
+// This intrinsic has no corresponding instruction.
+
+pub fn _mm256_insert_epi16<const INDEX: i32>(a: __m256i, i: i16) -> __m256i {
+    // // static_assert_uimm_bits!(INDEX, 4);
+    simd_insert(BitVec::to_i16x16(a), INDEX as u64, i).into()
+}
+
+/// Computes the bitwise AND of 256 bits (representing integer data) in `a` and
+/// `b`, and set `ZF` to 1 if the result is zero, otherwise set `ZF` to 0.
+/// Computes the bitwise NOT of `a` and then AND with `b`, and set `CF` to 1 if
+/// the result is zero, otherwise set `CF` to 0. Return the `ZF` value.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_testz_si256)
+pub fn _mm256_testz_si256(a: __m256i, b: __m256i) -> i32 {
+    let c = BitVec::<256>::from_fn(|i| match (a[i], b[i]) {
+        (Bit::One, Bit::One) => Bit::One,
+        _ => Bit::Zero,
+    });
+    let all_zero = c.fold(true, |acc, bit| acc && bit == Bit::Zero);
+    if all_zero {
+        1
+    } else {
+        0
+    }
+}
+
+/// Sets each bit of the returned mask based on the most significant bit of the
+/// corresponding packed single-precision (32-bit) floating-point element in
+/// `a`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_movemask_ps)
+pub fn _mm256_movemask_ps(a: __m256) -> i32 {
+    // Propagate the highest bit to the rest, because simd_bitmask
+    // requires all-1 or all-0.
+    let mask: i32x8 = simd_lt(BitVec::to_i32x8(a), i32x8::from_fn(|_| 0));
+    let r = simd_bitmask_little!(7, mask, u8);
+    r as u32 as i32
+}
+
+/// Returns vector of type __m256 with all elements set to zero.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_setzero_ps)
+
+pub fn _mm256_setzero_ps() -> __m256 {
+    BitVec::from_fn(|_| Bit::Zero)
+}
+
+/// Returns vector of type __m256i with all elements set to zero.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_setzero_si256)
+
+pub fn _mm256_setzero_si256() -> __m256i {
+    BitVec::from_fn(|_| Bit::Zero)
+}
+
+/// Sets packed 8-bit integers in returned vector with the supplied values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_set_epi8)
+
+// This intrinsic has no corresponding instruction.
+
+pub fn _mm256_set_epi8(
+    e00: i8,
+    e01: i8,
+    e02: i8,
+    e03: i8,
+    e04: i8,
+    e05: i8,
+    e06: i8,
+    e07: i8,
+    e08: i8,
+    e09: i8,
+    e10: i8,
+    e11: i8,
+    e12: i8,
+    e13: i8,
+    e14: i8,
+    e15: i8,
+    e16: i8,
+    e17: i8,
+    e18: i8,
+    e19: i8,
+    e20: i8,
+    e21: i8,
+    e22: i8,
+    e23: i8,
+    e24: i8,
+    e25: i8,
+    e26: i8,
+    e27: i8,
+    e28: i8,
+    e29: i8,
+    e30: i8,
+    e31: i8,
+) -> __m256i {
+    let vec = [
+        e00, e01, e02, e03, e04, e05, e06, e07, e08, e09, e10, e11, e12, e13, e14, e15, e16, e17,
+        e18, e19, e20, e21, e22, e23, e24, e25, e26, e27, e28, e29, e30, e31,
+    ];
+    BitVec::from_i8x32(i8x32::from_fn(|i| vec[(31 - i) as usize]))
+}
+
+/// Sets packed 16-bit integers in returned vector with the supplied values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_set_epi16)
+
+// This intrinsic has no corresponding instruction.
+
+pub fn _mm256_set_epi16(
+    e00: i16,
+    e01: i16,
+    e02: i16,
+    e03: i16,
+    e04: i16,
+    e05: i16,
+    e06: i16,
+    e07: i16,
+    e08: i16,
+    e09: i16,
+    e10: i16,
+    e11: i16,
+    e12: i16,
+    e13: i16,
+    e14: i16,
+    e15: i16,
+) -> __m256i {
+    let vec = [
+        e00, e01, e02, e03, e04, e05, e06, e07, e08, e09, e10, e11, e12, e13, e14, e15,
+    ];
+    BitVec::from_i16x16(i16x16::from_fn(|i| vec[(15 - i) as usize]))
+}
+
+/// Sets packed 32-bit integers in returned vector with the supplied values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_set_epi32)
+
+// This intrinsic has no corresponding instruction.
+
+pub fn _mm256_set_epi32(
+    e0: i32,
+    e1: i32,
+    e2: i32,
+    e3: i32,
+    e4: i32,
+    e5: i32,
+    e6: i32,
+    e7: i32,
+) -> __m256i {
+    let vec = [e0, e1, e2, e3, e4, e5, e6, e7];
+    BitVec::from_i32x8(i32x8::from_fn(|i| vec[(7 - i) as usize]))
+}
+
+/// Sets packed 64-bit integers in returned vector with the supplied values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_set_epi64x)
+// This intrinsic has no corresponding instruction.
+pub fn _mm256_set_epi64x(a: i64, b: i64, c: i64, d: i64) -> __m256i {
+    let vec = [d, c, b, a];
+    BitVec::from_i64x4(i64x4::from_fn(|i| vec[i as usize]))
+}
+
+/// Broadcasts 8-bit integer `a` to all elements of returned vector.
+/// This intrinsic may generate the `vpbroadcastw`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_set1_epi16)
+
+//
+
+// This intrinsic has no corresponding instruction.
+
+pub fn _mm256_set1_epi8(val: i8) -> BitVec<256> {
+    BitVec::from_i8x32(i8x32::from_fn(|_| val))
+}
+
+/// Broadcasts 16-bit integer `a` to all elements of returned vector.
+/// This intrinsic may generate the `vpbroadcastw`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_set1_epi16)
+
+//
+
+// This intrinsic has no corresponding instruction.
+
+pub fn _mm256_set1_epi16(a: i16) -> __m256i {
+    BitVec::from_i16x16(i16x16::from_fn(|_| a))
+}
+
+/// Broadcasts 32-bit integer `a` to all elements of returned vector.
+/// This intrinsic may generate the `vpbroadcastd`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_set1_epi32)
+
+// This intrinsic has no corresponding instruction.
+
+pub fn _mm256_set1_epi32(a: i32) -> __m256i {
+    BitVec::from_i32x8(i32x8::from_fn(|_| a))
+}
+
+/// Broadcasts 64-bit integer `a` to all elements of returned vector.
+/// This intrinsic may generate the `vpbroadcastq`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_set1_epi64x)
+// This intrinsic has no corresponding instruction.
+pub fn _mm256_set1_epi64x(a: i64) -> __m256i {
+    BitVec::from_i64x4(i64x4::from_fn(|_| a))
+}
+
+pub fn _mm256_castps_si256(a: __m256) -> __m256i {
+    a
+}
+
+/// Casts vector of type __m256i to type __m256.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_castsi256_ps)
+// This intrinsic is only used for compilation and does not generate any
+// instructions, thus it has zero latency.
+pub fn _mm256_castsi256_ps(a: __m256i) -> __m256 {
+    a
+}
+
+/// Casts vector of type __m256i to type __m128i.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_castsi256_si128)
+
+// This intrinsic is only used for compilation and does not generate any
+// instructions, thus it has zero latency.
+
+pub fn _mm256_castsi256_si128(a: __m256i) -> __m128i {
+    BitVec::from_fn(|i| a[i])
+}
+
+/// Casts vector of type __m128i to type __m256i;
+/// the upper 128 bits of the result are undefined.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_castsi128_si256)
+
+// This intrinsic is only used for compilation and does not generate any
+// instructions, thus it has zero latency.
+
+pub fn _mm256_castsi128_si256(a: __m128i) -> __m256i {
+    let a = BitVec::to_i64x2(a);
+    let undefined = i64x2::from_fn(|_| 0);
+    let dst: i64x4 = simd_shuffle(a, undefined, [0, 1, 2, 2]);
+    BitVec::from_i64x4(dst)
+}
+
+/// Sets packed __m256i returned vector with the supplied values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_set_m128i)
+
+pub fn _mm256_set_m128i(hi: __m128i, lo: __m128i) -> __m256i {
+    BitVec::from_fn(|i| if i < 128 { lo[i] } else { hi[i - 128] })
+}
diff --git a/testable-simd-models/src/core_arch/x86/models/avx2.rs b/testable-simd-models/src/core_arch/x86/models/avx2.rs
new file mode 100644
index 0000000000000..05173b19a8c58
--- /dev/null
+++ b/testable-simd-models/src/core_arch/x86/models/avx2.rs
@@ -0,0 +1,2493 @@
+//! Advanced Vector Extensions 2 (AVX)
+//!
+//!
+//! This module contains models for AVX2 intrinsics.
+//! AVX2 expands most AVX commands to 256-bit wide vector registers and
+//! adds [FMA](https://en.wikipedia.org/wiki/Fused_multiply-accumulate).
+//!
+//! The references are:
+//!
+//! - [Intel 64 and IA-32 Architectures Software Developer's Manual Volume 2:
+//!   Instruction Set Reference, A-Z][intel64_ref].
+//! - [AMD64 Architecture Programmer's Manual, Volume 3: General-Purpose and
+//!   System Instructions][amd64_ref].
+//!
+//! Wikipedia's [AVX][wiki_avx] and [FMA][wiki_fma] pages provide a quick
+//! overview of the instructions available.
+//!
+//! [intel64_ref]: http://www.intel.de/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf
+//! [amd64_ref]: http://support.amd.com/TechDocs/24594.pdf
+//! [wiki_avx]: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
+//! [wiki_fma]: https://en.wikipedia.org/wiki/Fused_multiply-accumulate
+use crate::abstractions::{bitvec::BitVec, simd::*};
+
+mod c_extern {
+    use crate::abstractions::{bit::MachineInteger, simd::*};
+    pub fn phaddw(a: i16x16, b: i16x16) -> i16x16 {
+        i16x16::from_fn(|i| {
+            if i < 4 {
+                a[2 * i].wrapping_add(a[2 * i + 1])
+            } else if i < 8 {
+                b[2 * (i - 4)].wrapping_add(b[2 * (i - 4) + 1])
+            } else if i < 12 {
+                a[2 * (i - 4)].wrapping_add(a[2 * (i - 4) + 1])
+            } else {
+                b[2 * (i - 8)].wrapping_add(b[2 * (i - 8) + 1])
+            }
+        })
+    }
+
+    pub fn phaddd(a: i32x8, b: i32x8) -> i32x8 {
+        i32x8::from_fn(|i| {
+            if i < 2 {
+                a[2 * i].wrapping_add(a[2 * i + 1])
+            } else if i < 4 {
+                b[2 * (i - 2)].wrapping_add(b[2 * (i - 2) + 1])
+            } else if i < 6 {
+                a[2 * (i - 2)].wrapping_add(a[2 * (i - 2) + 1])
+            } else {
+                b[2 * (i - 4)].wrapping_add(b[2 * (i - 4) + 1])
+            }
+        })
+    }
+
+    pub fn phaddsw(a: i16x16, b: i16x16) -> i16x16 {
+        i16x16::from_fn(|i| {
+            if i < 4 {
+                a[2 * i].saturating_add(a[2 * i + 1])
+            } else if i < 8 {
+                b[2 * (i - 4)].saturating_add(b[2 * (i - 4) + 1])
+            } else if i < 12 {
+                a[2 * (i - 4)].saturating_add(a[2 * (i - 4) + 1])
+            } else {
+                b[2 * (i - 8)].saturating_add(b[2 * (i - 8) + 1])
+            }
+        })
+    }
+
+    pub fn phsubw(a: i16x16, b: i16x16) -> i16x16 {
+        i16x16::from_fn(|i| {
+            if i < 4 {
+                a[2 * i].wrapping_sub(a[2 * i + 1])
+            } else if i < 8 {
+                b[2 * (i - 4)].wrapping_sub(b[2 * (i - 4) + 1])
+            } else if i < 12 {
+                a[2 * (i - 4)].wrapping_sub(a[2 * (i - 4) + 1])
+            } else {
+                b[2 * (i - 8)].wrapping_sub(b[2 * (i - 8) + 1])
+            }
+        })
+    }
+
+    pub fn phsubd(a: i32x8, b: i32x8) -> i32x8 {
+        i32x8::from_fn(|i| {
+            if i < 2 {
+                a[2 * i].wrapping_sub(a[2 * i + 1])
+            } else if i < 4 {
+                b[2 * (i - 2)].wrapping_sub(b[2 * (i - 2) + 1])
+            } else if i < 6 {
+                a[2 * (i - 2)].wrapping_sub(a[2 * (i - 2) + 1])
+            } else {
+                b[2 * (i - 4)].wrapping_sub(b[2 * (i - 4) + 1])
+            }
+        })
+    }
+
+    pub fn phsubsw(a: i16x16, b: i16x16) -> i16x16 {
+        i16x16::from_fn(|i| {
+            if i < 4 {
+                a[2 * i].saturating_sub(a[2 * i + 1])
+            } else if i < 8 {
+                b[2 * (i - 4)].saturating_sub(b[2 * (i - 4) + 1])
+            } else if i < 12 {
+                a[2 * (i - 4)].saturating_sub(a[2 * (i - 4) + 1])
+            } else {
+                b[2 * (i - 8)].saturating_sub(b[2 * (i - 8) + 1])
+            }
+        })
+    }
+    pub fn pmaddwd(a: i16x16, b: i16x16) -> i32x8 {
+        i32x8::from_fn(|i| {
+            (a[2 * i] as i32) * (b[2 * i] as i32) + (a[2 * i + 1] as i32) * (b[2 * i + 1] as i32)
+        })
+    }
+
+    pub fn pmaddubsw(a: u8x32, b: u8x32) -> i16x16 {
+        i16x16::from_fn(|i| {
+            ((a[2 * i] as u8 as u16 as i16) * (b[2 * i] as i8 as i16))
+                .saturating_add((a[2 * i + 1] as u8 as u16 as i16) * (b[2 * i + 1] as i8 as i16))
+        })
+    }
+    pub fn packsswb(a: i16x16, b: i16x16) -> i8x32 {
+        i8x32::from_fn(|i| {
+            if i < 8 {
+                if a[i] > (i8::MAX as i16) {
+                    i8::MAX
+                } else if a[i] < (i8::MIN as i16) {
+                    i8::MIN
+                } else {
+                    a[i] as i8
+                }
+            } else if i < 16 {
+                if b[i - 8] > (i8::MAX as i16) {
+                    i8::MAX
+                } else if b[i - 8] < (i8::MIN as i16) {
+                    i8::MIN
+                } else {
+                    b[i - 8] as i8
+                }
+            } else if i < 24 {
+                if a[i - 8] > (i8::MAX as i16) {
+                    i8::MAX
+                } else if a[i - 8] < (i8::MIN as i16) {
+                    i8::MIN
+                } else {
+                    a[i - 8] as i8
+                }
+            } else {
+                if b[i - 16] > (i8::MAX as i16) {
+                    i8::MAX
+                } else if b[i - 16] < (i8::MIN as i16) {
+                    i8::MIN
+                } else {
+                    b[i - 16] as i8
+                }
+            }
+        })
+    }
+
+    pub fn packssdw(a: i32x8, b: i32x8) -> i16x16 {
+        i16x16::from_fn(|i| {
+            if i < 4 {
+                if a[i] > (i16::MAX as i32) {
+                    i16::MAX
+                } else if a[i] < (i16::MIN as i32) {
+                    i16::MIN
+                } else {
+                    a[i] as i16
+                }
+            } else if i < 8 {
+                if b[i - 4] > (i16::MAX as i32) {
+                    i16::MAX
+                } else if b[i - 4] < (i16::MIN as i32) {
+                    i16::MIN
+                } else {
+                    b[i - 4] as i16
+                }
+            } else if i < 12 {
+                if a[i - 4] > (i16::MAX as i32) {
+                    i16::MAX
+                } else if a[i - 4] < (i16::MIN as i32) {
+                    i16::MIN
+                } else {
+                    a[i - 4] as i16
+                }
+            } else {
+                if b[i - 8] > (i16::MAX as i32) {
+                    i16::MAX
+                } else if b[i - 8] < (i16::MIN as i32) {
+                    i16::MIN
+                } else {
+                    b[i - 8] as i16
+                }
+            }
+        })
+    }
+
+    pub fn packuswb(a: i16x16, b: i16x16) -> u8x32 {
+        u8x32::from_fn(|i| {
+            if i < 8 {
+                if a[i] > (u8::MAX as i16) {
+                    u8::MAX
+                } else if a[i] < (u8::MIN as i16) {
+                    u8::MIN
+                } else {
+                    a[i] as u8
+                }
+            } else if i < 16 {
+                if b[i - 8] > (u8::MAX as i16) {
+                    u8::MAX
+                } else if b[i - 8] < (u8::MIN as i16) {
+                    u8::MIN
+                } else {
+                    b[i - 8] as u8
+                }
+            } else if i < 24 {
+                if a[i - 8] > (u8::MAX as i16) {
+                    u8::MAX
+                } else if a[i - 8] < (u8::MIN as i16) {
+                    u8::MIN
+                } else {
+                    a[i - 8] as u8
+                }
+            } else {
+                if b[i - 16] > (u8::MAX as i16) {
+                    u8::MAX
+                } else if b[i - 16] < (u8::MIN as i16) {
+                    u8::MIN
+                } else {
+                    b[i - 16] as u8
+                }
+            }
+        })
+    }
+
+    pub fn packusdw(a: i32x8, b: i32x8) -> u16x16 {
+        u16x16::from_fn(|i| {
+            if i < 4 {
+                if a[i] > (u16::MAX as i32) {
+                    u16::MAX
+                } else if a[i] < (u16::MIN as i32) {
+                    u16::MIN
+                } else {
+                    a[i] as u16
+                }
+            } else if i < 8 {
+                if b[i - 4] > (u16::MAX as i32) {
+                    u16::MAX
+                } else if b[i - 4] < (u16::MIN as i32) {
+                    u16::MIN
+                } else {
+                    b[i - 4] as u16
+                }
+            } else if i < 12 {
+                if a[i - 4] > (u16::MAX as i32) {
+                    u16::MAX
+                } else if a[i - 4] < (u16::MIN as i32) {
+                    u16::MIN
+                } else {
+                    a[i - 4] as u16
+                }
+            } else {
+                if b[i - 8] > (u16::MAX as i32) {
+                    u16::MAX
+                } else if b[i - 8] < (u16::MIN as i32) {
+                    u16::MIN
+                } else {
+                    b[i - 8] as u16
+                }
+            }
+        })
+    }
+
+    pub fn psignb(a: i8x32, b: i8x32) -> i8x32 {
+        i8x32::from_fn(|i| {
+            if b[i] < 0 {
+                if a[i] == i8::MIN {
+                    a[i]
+                } else {
+                    -a[i]
+                }
+            } else if b[i] > 0 {
+                a[i]
+            } else {
+                0
+            }
+        })
+    }
+    pub fn psignw(a: i16x16, b: i16x16) -> i16x16 {
+        i16x16::from_fn(|i| {
+            if b[i] < 0 {
+                if a[i] == i16::MIN {
+                    a[i]
+                } else {
+                    -a[i]
+                }
+            } else if b[i] > 0 {
+                a[i]
+            } else {
+                0
+            }
+        })
+    }
+
+    pub fn psignd(a: i32x8, b: i32x8) -> i32x8 {
+        i32x8::from_fn(|i| {
+            if b[i] < 0 {
+                if a[i] == i32::MIN {
+                    a[i]
+                } else {
+                    -a[i]
+                }
+            } else if b[i] > 0 {
+                a[i]
+            } else {
+                0
+            }
+        })
+    }
+
+    pub fn psllw(a: i16x16, count: i16x8) -> i16x16 {
+        let count4: u64 = (count[0] as u16) as u64;
+        let count3: u64 = ((count[1] as u16) as u64) * 65536;
+        let count2: u64 = ((count[2] as u16) as u64) * 4294967296;
+        let count1: u64 = ((count[3] as u16) as u64) * 281474976710656;
+        let count = count1 + count2 + count3 + count4;
+        i16x16::from_fn(|i| {
+            if count > 15 {
+                0
+            } else {
+                ((a[i] as u16) << count) as i16
+            }
+        })
+    }
+
+    pub fn pslld(a: i32x8, count: i32x4) -> i32x8 {
+        let count: u64 = ((count[1] as u32) as u64) * 4294967296 + ((count[0] as u32) as u64);
+
+        i32x8::from_fn(|i| {
+            if count > 31 {
+                0
+            } else {
+                ((a[i] as u32) << count) as i32
+            }
+        })
+    }
+    pub fn psllq(a: i64x4, count: i64x2) -> i64x4 {
+        let count: u64 = count[0] as u64;
+
+        i64x4::from_fn(|i| {
+            if count > 63 {
+                0
+            } else {
+                ((a[i] as u64) << count) as i64
+            }
+        })
+    }
+
+    pub fn psllvd(a: i32x4, count: i32x4) -> i32x4 {
+        i32x4::from_fn(|i| {
+            if count[i] > 31 || count[i] < 0 {
+                0
+            } else {
+                ((a[i] as u32) << count[i]) as i32
+            }
+        })
+    }
+    pub fn psllvd256(a: i32x8, count: i32x8) -> i32x8 {
+        i32x8::from_fn(|i| {
+            if count[i] > 31 || count[i] < 0 {
+                0
+            } else {
+                ((a[i] as u32) << count[i]) as i32
+            }
+        })
+    }
+
+    pub fn psllvq(a: i64x2, count: i64x2) -> i64x2 {
+        i64x2::from_fn(|i| {
+            if count[i] > 63 || count[i] < 0 {
+                0
+            } else {
+                ((a[i] as u64) << count[i]) as i64
+            }
+        })
+    }
+    pub fn psllvq256(a: i64x4, count: i64x4) -> i64x4 {
+        i64x4::from_fn(|i| {
+            if count[i] > 63 || count[i] < 0 {
+                0
+            } else {
+                ((a[i] as u64) << count[i]) as i64
+            }
+        })
+    }
+
+    pub fn psraw(a: i16x16, count: i16x8) -> i16x16 {
+        let count: u64 = ((count[3] as u16) as u64) * 281474976710656
+            + ((count[2] as u16) as u64) * 4294967296
+            + ((count[1] as u16) as u64) * 65536
+            + ((count[0] as u16) as u64);
+
+        i16x16::from_fn(|i| {
+            if count > 15 {
+                if a[i] < 0 {
+                    -1
+                } else {
+                    0
+                }
+            } else {
+                a[i] >> count
+            }
+        })
+    }
+
+    pub fn psrad(a: i32x8, count: i32x4) -> i32x8 {
+        let count: u64 = ((count[1] as u32) as u64) * 4294967296 + ((count[0] as u32) as u64);
+
+        i32x8::from_fn(|i| {
+            if count > 31 {
+                if a[i] < 0 {
+                    -1
+                } else {
+                    0
+                }
+            } else {
+                a[i] << count
+            }
+        })
+    }
+
+    pub fn psravd(a: i32x4, count: i32x4) -> i32x4 {
+        i32x4::from_fn(|i| {
+            if count[i] > 31 || count[i] < 0 {
+                if a[i] < 0 {
+                    -1
+                } else {
+                    0
+                }
+            } else {
+                a[i] >> count[i]
+            }
+        })
+    }
+
+    pub fn psravd256(a: i32x8, count: i32x8) -> i32x8 {
+        dbg!(a, count);
+        i32x8::from_fn(|i| {
+            if count[i] > 31 || count[i] < 0 {
+                if a[i] < 0 {
+                    -1
+                } else {
+                    0
+                }
+            } else {
+                a[i] >> count[i]
+            }
+        })
+    }
+
+    pub fn psrlw(a: i16x16, count: i16x8) -> i16x16 {
+        let count: u64 = (count[3] as u16 as u64) * 281474976710656
+            + (count[2] as u16 as u64) * 4294967296
+            + (count[1] as u16 as u64) * 65536
+            + (count[0] as u16 as u64);
+
+        i16x16::from_fn(|i| {
+            if count > 15 {
+                0
+            } else {
+                ((a[i] as u16) >> count) as i16
+            }
+        })
+    }
+
+    pub fn psrld(a: i32x8, count: i32x4) -> i32x8 {
+        let count: u64 = (count[1] as u32 as u64) * 4294967296 + (count[0] as u32 as u64);
+
+        i32x8::from_fn(|i| {
+            if count > 31 {
+                0
+            } else {
+                ((a[i] as u32) >> count) as i32
+            }
+        })
+    }
+
+    pub fn psrlq(a: i64x4, count: i64x2) -> i64x4 {
+        let count: u64 = count[0] as u64;
+
+        i64x4::from_fn(|i| {
+            if count > 63 {
+                0
+            } else {
+                ((a[i] as u64) >> count) as i64
+            }
+        })
+    }
+
+    pub fn psrlvd(a: i32x4, count: i32x4) -> i32x4 {
+        i32x4::from_fn(|i| {
+            if count[i] > 31 || count[i] < 0 {
+                0
+            } else {
+                ((a[i] as u32) >> count[i]) as i32
+            }
+        })
+    }
+    pub fn psrlvd256(a: i32x8, count: i32x8) -> i32x8 {
+        i32x8::from_fn(|i| {
+            if count[i] > 31 || count[i] < 0 {
+                0
+            } else {
+                ((a[i] as u32) >> count[i]) as i32
+            }
+        })
+    }
+
+    pub fn psrlvq(a: i64x2, count: i64x2) -> i64x2 {
+        i64x2::from_fn(|i| {
+            if count[i] > 63 || count[i] < 0 {
+                0
+            } else {
+                ((a[i] as u64) >> count[i]) as i64
+            }
+        })
+    }
+    pub fn psrlvq256(a: i64x4, count: i64x4) -> i64x4 {
+        i64x4::from_fn(|i| {
+            if count[i] > 63 || count[i] < 0 {
+                0
+            } else {
+                ((a[i] as u64) >> count[i]) as i64
+            }
+        })
+    }
+
+    pub fn pshufb(a: u8x32, b: u8x32) -> u8x32 {
+        u8x32::from_fn(|i| {
+            if i < 16 {
+                if b[i] > 127 {
+                    0
+                } else {
+                    let index: u64 = (b[i] % 16) as u64;
+                    a[index]
+                }
+            } else {
+                if b[i] > 127 {
+                    0
+                } else {
+                    let index: u64 = (b[i] % 16) as u64;
+                    a[index + 16]
+                }
+            }
+        })
+    }
+
+    pub fn permd(a: u32x8, b: u32x8) -> u32x8 {
+        u32x8::from_fn(|i| {
+            let id = b[i] % 8;
+            a[id as u64]
+        })
+    }
+
+    pub fn mpsadbw(a: u8x32, b: u8x32, imm8: i32) -> u16x16 {
+        u16x16::from_fn(|i| {
+            if i < 8 {
+                let a_offset = (((imm8 & 4) >> 2) * 4) as u32 as u64;
+                let b_offset = ((imm8 & 3) * 4) as u32 as u64;
+                let k = a_offset + i;
+                let l = b_offset;
+                ((a[k].absolute_diff(b[l]) as i8) as u8 as u16)
+                    + ((a[k + 1].absolute_diff(b[l + 1]) as i8) as u8 as u16)
+                    + ((a[k + 2].absolute_diff(b[l + 2]) as i8) as u8 as u16)
+                    + ((a[k + 3].absolute_diff(b[l + 3]) as i8) as u8 as u16)
+            } else {
+                let i = i - 8;
+                let imm8 = imm8 >> 3;
+                let a_offset = (((imm8 & 4) >> 2) * 4) as u32 as u64;
+                let b_offset = ((imm8 & 3) * 4) as u32 as u64;
+                let k = a_offset + i;
+                let l = b_offset;
+                ((a[16 + k].absolute_diff(b[16 + l]) as i8) as u8 as u16)
+                    + ((a[16 + k + 1].absolute_diff(b[16 + l + 1]) as i8) as u8 as u16)
+                    + ((a[16 + k + 2].absolute_diff(b[16 + l + 2]) as i8) as u8 as u16)
+                    + ((a[16 + k + 3].absolute_diff(b[16 + l + 3]) as i8) as u8 as u16)
+            }
+        })
+    }
+
+    pub fn vperm2i128(a: i64x4, b: i64x4, imm8: i8) -> i64x4 {
+        let a = i128x2::from_fn(|i| {
+            ((a[2 * i] as u64 as u128) + ((a[2 * i + 1] as u64 as u128) << 64)) as i128
+        });
+        let b = i128x2::from_fn(|i| {
+            ((b[2 * i] as u64 as u128) + ((b[2 * i + 1] as u64 as u128) << 64)) as i128
+        });
+        let imm8 = imm8 as u8 as u32 as i32;
+        let r = i128x2::from_fn(|i| {
+            let control = imm8 >> (i * 4);
+            if (control >> 3) % 2 == 1 {
+                0
+            } else {
+                match control % 4 {
+                    0 => a[0],
+                    1 => a[1],
+                    2 => b[0],
+                    3 => b[1],
+                    _ => unreachable!(),
+                }
+            }
+        });
+        i64x4::from_fn(|i| {
+            let index = i >> 1;
+            let hilo = i.rem_euclid(2);
+            let val = r[index];
+            if hilo == 0 {
+                i64::cast(val)
+            } else {
+                i64::cast(val >> 64)
+            }
+        })
+    }
+    pub fn pmulhrsw(a: i16x16, b: i16x16) -> i16x16 {
+        i16x16::from_fn(|i| {
+            let temp = (a[i] as i32) * (b[i] as i32);
+            let temp = (temp >> 14).wrapping_add(1) >> 1;
+            temp as i16
+        })
+    }
+
+    pub fn psadbw(a: u8x32, b: u8x32) -> u64x4 {
+        let tmp = u8x32::from_fn(|i| a[i].absolute_diff(b[i]));
+        u64x4::from_fn(|i| {
+            (tmp[i * 8] as u16)
+                .wrapping_add(tmp[i * 8 + 1] as u16)
+                .wrapping_add(tmp[i * 8 + 2] as u16)
+                .wrapping_add(tmp[i * 8 + 3] as u16)
+                .wrapping_add(tmp[i * 8 + 4] as u16)
+                .wrapping_add(tmp[i * 8 + 5] as u16)
+                .wrapping_add(tmp[i * 8 + 6] as u16)
+                .wrapping_add(tmp[i * 8 + 7] as u16) as u64
+        })
+    }
+}
+use c_extern::*;
+
+use super::avx::*;
+use super::types::*;
+use crate::abstractions::simd::*;
+/// Computes the absolute values of packed 32-bit integers in `a`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_abs_epi32)
+
+pub fn _mm256_abs_epi32(a: __m256i) -> __m256i {
+    let a = BitVec::to_i32x8(a);
+    let r = simd_select(simd_lt(a, i32x8::from_fn(|_| 0)), simd_neg(a), a);
+    BitVec::from_i32x8(r)
+}
+
+/// Computes the absolute values of packed 16-bit integers in `a`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_abs_epi16)
+
+pub fn _mm256_abs_epi16(a: __m256i) -> __m256i {
+    let a = BitVec::to_i16x16(a);
+    let r = simd_select(simd_lt(a, i16x16::from_fn(|_| 0)), simd_neg(a), a);
+    BitVec::from_i16x16(r)
+}
+
+/// Computes the absolute values of packed 8-bit integers in `a`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_abs_epi8)
+
+pub fn _mm256_abs_epi8(a: __m256i) -> __m256i {
+    let a = BitVec::to_i8x32(a);
+    let r = simd_select(simd_lt(a, i8x32::from_fn(|_| 0)), simd_neg(a), a);
+    BitVec::from_i8x32(r)
+}
+
+/// Adds packed 64-bit integers in `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_add_epi64)
+
+pub fn _mm256_add_epi64(a: __m256i, b: __m256i) -> __m256i {
+    BitVec::from_i64x4(simd_add(BitVec::to_i64x4(a), BitVec::to_i64x4(b)))
+}
+
+/// Adds packed 32-bit integers in `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_add_epi32)
+
+pub fn _mm256_add_epi32(a: __m256i, b: __m256i) -> __m256i {
+    BitVec::from_i32x8(simd_add(BitVec::to_i32x8(a), BitVec::to_i32x8(b)))
+}
+
+/// Adds packed 16-bit integers in `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_add_epi16)
+
+pub fn _mm256_add_epi16(a: __m256i, b: __m256i) -> __m256i {
+    BitVec::from_i16x16(simd_add(BitVec::to_i16x16(a), BitVec::to_i16x16(b)))
+}
+
+/// Adds packed 8-bit integers in `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_add_epi8)
+
+pub fn _mm256_add_epi8(a: __m256i, b: __m256i) -> __m256i {
+    BitVec::from_i8x32(simd_add(BitVec::to_i8x32(a), BitVec::to_i8x32(b)))
+}
+
+/// Adds packed 8-bit integers in `a` and `b` using saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_adds_epi8)
+
+pub fn _mm256_adds_epi8(a: __m256i, b: __m256i) -> __m256i {
+    BitVec::from_i8x32(simd_saturating_add(
+        BitVec::to_i8x32(a),
+        BitVec::to_i8x32(b),
+    ))
+}
+
+/// Adds packed 16-bit integers in `a` and `b` using saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_adds_epi16)
+
+pub fn _mm256_adds_epi16(a: __m256i, b: __m256i) -> __m256i {
+    BitVec::from_i16x16(simd_saturating_add(
+        BitVec::to_i16x16(a),
+        BitVec::to_i16x16(b),
+    ))
+}
+
+/// Adds packed unsigned 8-bit integers in `a` and `b` using saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_adds_epu8)
+
+pub fn _mm256_adds_epu8(a: __m256i, b: __m256i) -> __m256i {
+    simd_saturating_add(BitVec::to_u8x32(a), BitVec::to_u8x32(b)).into()
+}
+
+/// Adds packed unsigned 16-bit integers in `a` and `b` using saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_adds_epu16)
+
+pub fn _mm256_adds_epu16(a: __m256i, b: __m256i) -> __m256i {
+    simd_saturating_add(BitVec::to_u16x16(a), BitVec::to_u16x16(b)).into()
+}
+
+/// Concatenates pairs of 16-byte blocks in `a` and `b` into a 32-byte temporary
+/// result, shifts the result right by `n` bytes, and returns the low 16 bytes.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_alignr_epi8)
+
+pub fn _mm256_alignr_epi8<const IMM8: i32>(a: __m256i, b: __m256i) -> __m256i {
+    // If palignr is shifting the pair of vectors more than the size of two
+    // lanes, emit zero.
+    if IMM8 >= 32 {
+        return _mm256_setzero_si256();
+    }
+    // If palignr is shifting the pair of input vectors more than one lane,
+    // but less than two lanes, convert to shifting in zeroes.
+    let (a, b) = if IMM8 > 16 {
+        (_mm256_setzero_si256(), a)
+    } else {
+        (a, b)
+    };
+
+    let a = BitVec::to_i8x32(a);
+    let b = BitVec::to_i8x32(b);
+
+    if IMM8 == 16 {
+        return a.into();
+    }
+
+    let r: i8x32 = match IMM8 % 16 {
+        0 => simd_shuffle(
+            b,
+            a,
+            [
+                0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
+                23, 24, 25, 26, 27, 28, 29, 30, 31,
+            ],
+        ),
+        1 => simd_shuffle(
+            b,
+            a,
+            [
+                1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 32, 17, 18, 19, 20, 21, 22, 23,
+                24, 25, 26, 27, 28, 29, 30, 31, 48,
+            ],
+        ),
+        2 => simd_shuffle(
+            b,
+            a,
+            [
+                2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 32, 33, 18, 19, 20, 21, 22, 23, 24,
+                25, 26, 27, 28, 29, 30, 31, 48, 49,
+            ],
+        ),
+        3 => simd_shuffle(
+            b,
+            a,
+            [
+                3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 32, 33, 34, 19, 20, 21, 22, 23, 24,
+                25, 26, 27, 28, 29, 30, 31, 48, 49, 50,
+            ],
+        ),
+        4 => simd_shuffle(
+            b,
+            a,
+            [
+                4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 32, 33, 34, 35, 20, 21, 22, 23, 24, 25,
+                26, 27, 28, 29, 30, 31, 48, 49, 50, 51,
+            ],
+        ),
+        5 => simd_shuffle(
+            b,
+            a,
+            [
+                5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 32, 33, 34, 35, 36, 21, 22, 23, 24, 25, 26,
+                27, 28, 29, 30, 31, 48, 49, 50, 51, 52,
+            ],
+        ),
+        6 => simd_shuffle(
+            b,
+            a,
+            [
+                6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 32, 33, 34, 35, 36, 37, 22, 23, 24, 25, 26, 27,
+                28, 29, 30, 31, 48, 49, 50, 51, 52, 53,
+            ],
+        ),
+        7 => simd_shuffle(
+            b,
+            a,
+            [
+                7, 8, 9, 10, 11, 12, 13, 14, 15, 32, 33, 34, 35, 36, 37, 38, 23, 24, 25, 26, 27,
+                28, 29, 30, 31, 48, 49, 50, 51, 52, 53, 54,
+            ],
+        ),
+        8 => simd_shuffle(
+            b,
+            a,
+            [
+                8, 9, 10, 11, 12, 13, 14, 15, 32, 33, 34, 35, 36, 37, 38, 39, 24, 25, 26, 27, 28,
+                29, 30, 31, 48, 49, 50, 51, 52, 53, 54, 55,
+            ],
+        ),
+        9 => simd_shuffle(
+            b,
+            a,
+            [
+                9, 10, 11, 12, 13, 14, 15, 32, 33, 34, 35, 36, 37, 38, 39, 40, 25, 26, 27, 28, 29,
+                30, 31, 48, 49, 50, 51, 52, 53, 54, 55, 56,
+            ],
+        ),
+        10 => simd_shuffle(
+            b,
+            a,
+            [
+                10, 11, 12, 13, 14, 15, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 26, 27, 28, 29, 30,
+                31, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,
+            ],
+        ),
+        11 => simd_shuffle(
+            b,
+            a,
+            [
+                11, 12, 13, 14, 15, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 27, 28, 29, 30, 31,
+                48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,
+            ],
+        ),
+        12 => simd_shuffle(
+            b,
+            a,
+            [
+                12, 13, 14, 15, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 28, 29, 30, 31, 48,
+                49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
+            ],
+        ),
+        13 => simd_shuffle(
+            b,
+            a,
+            [
+                13, 14, 15, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 29, 30, 31, 48, 49,
+                50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
+            ],
+        ),
+        14 => simd_shuffle(
+            b,
+            a,
+            [
+                14, 15, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 30, 31, 48, 49, 50,
+                51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,
+            ],
+        ),
+        15 => simd_shuffle(
+            b,
+            a,
+            [
+                15, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 31, 48, 49, 50, 51,
+                52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62,
+            ],
+        ),
+        _ => unreachable!(),
+    };
+    r.into()
+}
+
+/// Computes the bitwise AND of 256 bits (representing integer data)
+/// in `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_and_si256)
+
+pub fn _mm256_and_si256(a: __m256i, b: __m256i) -> __m256i {
+    simd_and(BitVec::to_i64x4(a), BitVec::to_i64x4(b)).into()
+}
+
+/// Computes the bitwise NOT of 256 bits (representing integer data)
+/// in `a` and then AND with `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_andnot_si256)
+
+pub fn _mm256_andnot_si256(a: __m256i, b: __m256i) -> __m256i {
+    let all_ones = _mm256_set1_epi8(-1);
+    simd_and(
+        simd_xor(BitVec::to_i64x4(a), BitVec::to_i64x4(all_ones)),
+        BitVec::to_i64x4(b),
+    )
+    .into()
+}
+
+/// Averages packed unsigned 16-bit integers in `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_avg_epu16)
+
+pub fn _mm256_avg_epu16(a: __m256i, b: __m256i) -> __m256i {
+    let a = simd_cast::<16, _, u32>(BitVec::to_u16x16(a));
+    let b = simd_cast::<16, _, u32>(BitVec::to_u16x16(b));
+    let r = simd_shr(simd_add(simd_add(a, b), u32x16::splat(1)), u32x16::splat(1));
+    simd_cast::<16, _, u16>(r).into()
+}
+
+/// Averages packed unsigned 8-bit integers in `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_avg_epu8)
+
+pub fn _mm256_avg_epu8(a: __m256i, b: __m256i) -> __m256i {
+    let a = simd_cast::<32, _, u16>(BitVec::to_u8x32(a));
+    let b = simd_cast::<32, _, u16>(BitVec::to_u8x32(b));
+    let r = simd_shr(simd_add(simd_add(a, b), u16x32::splat(1)), u16x32::splat(1));
+    simd_cast::<32, _, u8>(r).into()
+}
+
+/// Blends packed 32-bit integers from `a` and `b` using control mask `IMM4`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_blend_epi32)
+
+pub fn _mm_blend_epi32<const IMM4: i32>(a: __m128i, b: __m128i) -> __m128i {
+    let a = BitVec::to_i32x4(a);
+    let b = BitVec::to_i32x4(b);
+    let r: i32x4 = simd_shuffle(
+        a,
+        b,
+        [
+            [0, 4, 0, 4][IMM4 as usize & 0b11],
+            [1, 1, 5, 5][IMM4 as usize & 0b11],
+            [2, 6, 2, 6][(IMM4 as usize >> 2) & 0b11],
+            [3, 3, 7, 7][(IMM4 as usize >> 2) & 0b11],
+        ],
+    );
+    r.into()
+}
+
+/// Blends packed 32-bit integers from `a` and `b` using control mask `IMM8`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_blend_epi32)
+
+pub fn _mm256_blend_epi32<const IMM8: i32>(a: __m256i, b: __m256i) -> __m256i {
+    let a = BitVec::to_i32x8(a);
+    let b = BitVec::to_i32x8(b);
+    let r: i32x8 = simd_shuffle(
+        a,
+        b,
+        [
+            [0, 8, 0, 8][IMM8 as usize & 0b11],
+            [1, 1, 9, 9][IMM8 as usize & 0b11],
+            [2, 10, 2, 10][(IMM8 as usize >> 2) & 0b11],
+            [3, 3, 11, 11][(IMM8 as usize >> 2) & 0b11],
+            [4, 12, 4, 12][(IMM8 as usize >> 4) & 0b11],
+            [5, 5, 13, 13][(IMM8 as usize >> 4) & 0b11],
+            [6, 14, 6, 14][(IMM8 as usize >> 6) & 0b11],
+            [7, 7, 15, 15][(IMM8 as usize >> 6) & 0b11],
+        ],
+    );
+    r.into()
+}
+
+/// Blends packed 16-bit integers from `a` and `b` using control mask `IMM8`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_blend_epi16)
+pub fn _mm256_blend_epi16<const IMM8: i32>(a: __m256i, b: __m256i) -> __m256i {
+    let a = BitVec::to_i16x16(a);
+    let b = BitVec::to_i16x16(b);
+
+    let r: i16x16 = simd_shuffle(
+        a,
+        b,
+        [
+            [0, 16, 0, 16][IMM8 as usize & 0b11],
+            [1, 1, 17, 17][IMM8 as usize & 0b11],
+            [2, 18, 2, 18][(IMM8 as usize >> 2) & 0b11],
+            [3, 3, 19, 19][(IMM8 as usize >> 2) & 0b11],
+            [4, 20, 4, 20][(IMM8 as usize >> 4) & 0b11],
+            [5, 5, 21, 21][(IMM8 as usize >> 4) & 0b11],
+            [6, 22, 6, 22][(IMM8 as usize >> 6) & 0b11],
+            [7, 7, 23, 23][(IMM8 as usize >> 6) & 0b11],
+            [8, 24, 8, 24][IMM8 as usize & 0b11],
+            [9, 9, 25, 25][IMM8 as usize & 0b11],
+            [10, 26, 10, 26][(IMM8 as usize >> 2) & 0b11],
+            [11, 11, 27, 27][(IMM8 as usize >> 2) & 0b11],
+            [12, 28, 12, 28][(IMM8 as usize >> 4) & 0b11],
+            [13, 13, 29, 29][(IMM8 as usize >> 4) & 0b11],
+            [14, 30, 14, 30][(IMM8 as usize >> 6) & 0b11],
+            [15, 15, 31, 31][(IMM8 as usize >> 6) & 0b11],
+        ],
+    );
+    r.into()
+}
+
+/// Blends packed 8-bit integers from `a` and `b` using `mask`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_blendv_epi8)
+pub fn _mm256_blendv_epi8(a: __m256i, b: __m256i, mask: __m256i) -> __m256i {
+    let mask: i8x32 = simd_lt(BitVec::to_i8x32(mask), i8x32::from_fn(|_| 0));
+    simd_select(mask, BitVec::to_i8x32(b), BitVec::to_i8x32(a)).into()
+}
+
+/// Broadcasts the low packed 8-bit integer from `a` to all elements of
+/// the 128-bit returned value.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_broadcastb_epi8)
+pub fn _mm_broadcastb_epi8(a: __m128i) -> __m128i {
+    let ret = simd_shuffle(BitVec::to_i8x16(a), i8x16::from_fn(|_| 0), [0_u64; 16]);
+    ret.into()
+}
+
+/// Broadcasts the low packed 8-bit integer from `a` to all elements of
+/// the 256-bit returned value.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_broadcastb_epi8)
+pub fn _mm256_broadcastb_epi8(a: __m128i) -> __m256i {
+    let ret = simd_shuffle(BitVec::to_i8x16(a), i8x16::from_fn(|_| 0), [0_u64; 32]);
+    ret.into()
+}
+
+// N.B., `simd_shuffle4` with integer data types for `a` and `b` is
+// often compiled to `vbroadcastss`.
+/// Broadcasts the low packed 32-bit integer from `a` to all elements of
+/// the 128-bit returned value.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_broadcastd_epi32)
+
+pub fn _mm_broadcastd_epi32(a: __m128i) -> __m128i {
+    let ret = simd_shuffle(BitVec::to_i32x4(a), i32x4::from_fn(|_| 0), [0_u64; 4]);
+    ret.into()
+}
+
+// N.B., `simd_shuffle4`` with integer data types for `a` and `b` is
+// often compiled to `vbroadcastss`.
+/// Broadcasts the low packed 32-bit integer from `a` to all elements of
+/// the 256-bit returned value.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_broadcastd_epi32)
+
+pub fn _mm256_broadcastd_epi32(a: __m128i) -> __m256i {
+    let ret = simd_shuffle(BitVec::to_i32x4(a), i32x4::from_fn(|_| 0), [0_u64; 8]);
+    ret.into()
+}
+
+/// Broadcasts the low packed 64-bit integer from `a` to all elements of
+/// the 128-bit returned value.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_broadcastq_epi64)
+
+// Emits `vmovddup` instead of `vpbroadcastq`
+// See https://github.com/rust-lang/stdarch/issues/791
+
+pub fn _mm_broadcastq_epi64(a: __m128i) -> __m128i {
+    let ret = simd_shuffle(BitVec::to_i64x2(a), BitVec::to_i64x2(a), [0_u64; 2]);
+    ret.into()
+}
+
+/// Broadcasts the low packed 64-bit integer from `a` to all elements of
+/// the 256-bit returned value.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_broadcastq_epi64)
+
+pub fn _mm256_broadcastq_epi64(a: __m128i) -> __m256i {
+    let ret = simd_shuffle(BitVec::to_i64x2(a), BitVec::to_i64x2(a), [0_u64; 4]);
+    ret.into()
+}
+
+/// Broadcasts 128 bits of integer data from a to all 128-bit lanes in
+/// the 256-bit returned value.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_broadcastsi128_si256)
+
+pub fn _mm_broadcastsi128_si256(a: __m128i) -> __m256i {
+    let ret = simd_shuffle(BitVec::to_i64x2(a), i64x2::from_fn(|_| 0), [0, 1, 0, 1]);
+    ret.into()
+}
+
+// N.B., `broadcastsi128_si256` is often compiled to `vinsertf128` or
+// `vbroadcastf128`.
+/// Broadcasts 128 bits of integer data from a to all 128-bit lanes in
+/// the 256-bit returned value.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_broadcastsi128_si256)
+
+pub fn _mm256_broadcastsi128_si256(a: __m128i) -> __m256i {
+    let ret = simd_shuffle(BitVec::to_i64x2(a), i64x2::from_fn(|_| 0), [0, 1, 0, 1]);
+    ret.into()
+}
+
+/// Broadcasts the low packed 16-bit integer from a to all elements of
+/// the 128-bit returned value
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_broadcastw_epi16)
+
+pub fn _mm_broadcastw_epi16(a: __m128i) -> __m128i {
+    let ret = simd_shuffle(BitVec::to_i16x8(a), i16x8::from_fn(|_| 0), [0_u64; 8]);
+    ret.into()
+}
+
+/// Broadcasts the low packed 16-bit integer from a to all elements of
+/// the 256-bit returned value
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_broadcastw_epi16)
+
+pub fn _mm256_broadcastw_epi16(a: __m128i) -> __m256i {
+    let ret = simd_shuffle(BitVec::to_i16x8(a), i16x8::from_fn(|_| 0), [0_u64; 16]);
+    ret.into()
+}
+
+/// Compares packed 64-bit integers in `a` and `b` for equality.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cmpeq_epi64)
+
+pub fn _mm256_cmpeq_epi64(a: __m256i, b: __m256i) -> __m256i {
+    simd_eq(BitVec::to_i64x4(a), BitVec::to_i64x4(b)).into()
+}
+
+/// Compares packed 32-bit integers in `a` and `b` for equality.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cmpeq_epi32)
+
+pub fn _mm256_cmpeq_epi32(a: __m256i, b: __m256i) -> __m256i {
+    simd_eq(BitVec::to_i32x8(a), BitVec::to_i32x8(b)).into()
+}
+
+/// Compares packed 16-bit integers in `a` and `b` for equality.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cmpeq_epi16)
+
+pub fn _mm256_cmpeq_epi16(a: __m256i, b: __m256i) -> __m256i {
+    simd_eq(BitVec::to_i16x16(a), BitVec::to_i16x16(b)).into()
+}
+
+/// Compares packed 8-bit integers in `a` and `b` for equality.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cmpeq_epi8)
+
+pub fn _mm256_cmpeq_epi8(a: __m256i, b: __m256i) -> __m256i {
+    simd_eq(BitVec::to_i8x32(a), BitVec::to_i8x32(b)).into()
+}
+
+/// Compares packed 64-bit integers in `a` and `b` for greater-than.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cmpgt_epi64)
+
+pub fn _mm256_cmpgt_epi64(a: __m256i, b: __m256i) -> __m256i {
+    simd_gt(BitVec::to_i64x4(a), BitVec::to_i64x4(b)).into()
+}
+
+/// Compares packed 32-bit integers in `a` and `b` for greater-than.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cmpgt_epi32)
+
+pub fn _mm256_cmpgt_epi32(a: __m256i, b: __m256i) -> __m256i {
+    simd_gt(BitVec::to_i32x8(a), BitVec::to_i32x8(b)).into()
+}
+
+/// Compares packed 16-bit integers in `a` and `b` for greater-than.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cmpgt_epi16)
+
+pub fn _mm256_cmpgt_epi16(a: __m256i, b: __m256i) -> __m256i {
+    simd_gt(BitVec::to_i16x16(a), BitVec::to_i16x16(b)).into()
+}
+
+/// Compares packed 8-bit integers in `a` and `b` for greater-than.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cmpgt_epi8)
+
+pub fn _mm256_cmpgt_epi8(a: __m256i, b: __m256i) -> __m256i {
+    simd_gt(BitVec::to_i8x32(a), BitVec::to_i8x32(b)).into()
+}
+
+/// Sign-extend 16-bit integers to 32-bit integers.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cvtepi16_epi32)
+
+pub fn _mm256_cvtepi16_epi32(a: __m128i) -> __m256i {
+    simd_cast::<8, _, i32>(BitVec::to_i16x8(a)).into()
+}
+
+/// Sign-extend 16-bit integers to 64-bit integers.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cvtepi16_epi64)
+
+pub fn _mm256_cvtepi16_epi64(a: __m128i) -> __m256i {
+    let a = BitVec::to_i16x8(a);
+    let v64: i16x4 = simd_shuffle(a, a, [0, 1, 2, 3]);
+    simd_cast::<4, i16, i64>(v64).into()
+}
+
+/// Sign-extend 32-bit integers to 64-bit integers.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cvtepi32_epi64)
+
+pub fn _mm256_cvtepi32_epi64(a: __m128i) -> __m256i {
+    simd_cast::<4, i32, i64>(BitVec::to_i32x4(a)).into()
+}
+
+/// Sign-extend 8-bit integers to 16-bit integers.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cvtepi8_epi16)
+
+pub fn _mm256_cvtepi8_epi16(a: __m128i) -> __m256i {
+    simd_cast::<16, i8, i16>(BitVec::to_i8x16(a)).into()
+}
+
+/// Sign-extend 8-bit integers to 32-bit integers.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cvtepi8_epi32)
+
+pub fn _mm256_cvtepi8_epi32(a: __m128i) -> __m256i {
+    let a = BitVec::to_i8x16(a);
+    let v64: i8x8 = simd_shuffle(a, a, [0, 1, 2, 3, 4, 5, 6, 7]);
+    simd_cast::<8, i8, i32>(v64).into()
+}
+
+/// Sign-extend 8-bit integers to 64-bit integers.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cvtepi8_epi64)
+pub fn _mm256_cvtepi8_epi64(a: __m128i) -> __m256i {
+    let a = BitVec::to_i8x16(a);
+    let v32: i8x4 = simd_shuffle(a, a, [0, 1, 2, 3]);
+    simd_cast::<4, i8, i64>(v32).into()
+}
+
+/// Zeroes extend packed unsigned 16-bit integers in `a` to packed 32-bit
+/// integers, and stores the results in `dst`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cvtepu16_epi32)
+
+pub fn _mm256_cvtepu16_epi32(a: __m128i) -> __m256i {
+    simd_cast::<8, u16, u32>(BitVec::to_u16x8(a)).into()
+}
+
+/// Zero-extend the lower four unsigned 16-bit integers in `a` to 64-bit
+/// integers. The upper four elements of `a` are unused.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cvtepu16_epi64)
+
+pub fn _mm256_cvtepu16_epi64(a: __m128i) -> __m256i {
+    let a = BitVec::to_u16x8(a);
+    let v64: u16x4 = simd_shuffle(a, a, [0, 1, 2, 3]);
+    simd_cast::<4, u16, u64>(v64).into()
+}
+
+/// Zero-extend unsigned 32-bit integers in `a` to 64-bit integers.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cvtepu32_epi64)
+
+pub fn _mm256_cvtepu32_epi64(a: __m128i) -> __m256i {
+    simd_cast::<4, u32, u64>(BitVec::to_u32x4(a)).into()
+}
+
+/// Zero-extend unsigned 8-bit integers in `a` to 16-bit integers.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cvtepu8_epi16)
+
+pub fn _mm256_cvtepu8_epi16(a: __m128i) -> __m256i {
+    simd_cast::<16, u8, u16>(BitVec::to_u8x16(a)).into()
+}
+
+/// Zero-extend the lower eight unsigned 8-bit integers in `a` to 32-bit
+/// integers. The upper eight elements of `a` are unused.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cvtepu8_epi32)
+
+pub fn _mm256_cvtepu8_epi32(a: __m128i) -> __m256i {
+    let a = BitVec::to_u8x16(a);
+    let v64: u8x8 = simd_shuffle(a, a, [0, 1, 2, 3, 4, 5, 6, 7]);
+    simd_cast::<8, u8, u32>(v64).into()
+}
+
+/// Zero-extend the lower four unsigned 8-bit integers in `a` to 64-bit
+/// integers. The upper twelve elements of `a` are unused.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cvtepu8_epi64)
+
+pub fn _mm256_cvtepu8_epi64(a: __m128i) -> __m256i {
+    let a = BitVec::to_u8x16(a);
+    let v32: u8x4 = simd_shuffle(a, a, [0, 1, 2, 3]);
+    simd_cast::<4, u8, u64>(v32).into()
+}
+
+/// Extracts 128 bits (of integer data) from `a` selected with `IMM1`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_extracti128_si256)
+
+pub fn _mm256_extracti128_si256<const IMM1: i32>(a: __m256i) -> __m128i {
+    let a = BitVec::to_i64x4(a);
+    let b = i64x4::from_fn(|_| 0);
+    let dst: i64x2 = simd_shuffle(a, b, [[0, 1], [2, 3]][IMM1 as usize]);
+    dst.into()
+}
+
+/// Horizontally adds adjacent pairs of 16-bit integers in `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_hadd_epi16)
+
+pub fn _mm256_hadd_epi16(a: __m256i, b: __m256i) -> __m256i {
+    phaddw(BitVec::to_i16x16(a), BitVec::to_i16x16(b)).into()
+}
+
+/// Horizontally adds adjacent pairs of 32-bit integers in `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_hadd_epi32)
+
+pub fn _mm256_hadd_epi32(a: __m256i, b: __m256i) -> __m256i {
+    phaddd(BitVec::to_i32x8(a), BitVec::to_i32x8(b)).into()
+}
+
+/// Horizontally adds adjacent pairs of 16-bit integers in `a` and `b`
+/// using saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_hadds_epi16)
+
+pub fn _mm256_hadds_epi16(a: __m256i, b: __m256i) -> __m256i {
+    phaddsw(BitVec::to_i16x16(a), BitVec::to_i16x16(b)).into()
+}
+
+/// Horizontally subtract adjacent pairs of 16-bit integers in `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_hsub_epi16)
+
+pub fn _mm256_hsub_epi16(a: __m256i, b: __m256i) -> __m256i {
+    phsubw(BitVec::to_i16x16(a), BitVec::to_i16x16(b)).into()
+}
+
+/// Horizontally subtract adjacent pairs of 32-bit integers in `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_hsub_epi32)
+
+pub fn _mm256_hsub_epi32(a: __m256i, b: __m256i) -> __m256i {
+    phsubd(BitVec::to_i32x8(a), BitVec::to_i32x8(b)).into()
+}
+
+/// Horizontally subtract adjacent pairs of 16-bit integers in `a` and `b`
+/// using saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_hsubs_epi16)
+
+pub fn _mm256_hsubs_epi16(a: __m256i, b: __m256i) -> __m256i {
+    phsubsw(BitVec::to_i16x16(a), BitVec::to_i16x16(b)).into()
+}
+
+/// Copies `a` to `dst`, then insert 128 bits (of integer data) from `b` at the
+/// location specified by `IMM1`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_inserti128_si256)
+
+pub fn _mm256_inserti128_si256<const IMM1: i32>(a: __m256i, b: __m128i) -> __m256i {
+    let a = BitVec::to_i64x4(a);
+    let b = BitVec::to_i64x4(_mm256_castsi128_si256(b));
+    let dst: i64x4 = simd_shuffle(a, b, [[4, 5, 2, 3], [0, 1, 4, 5]][IMM1 as usize]);
+    dst.into()
+}
+
+/// Multiplies packed signed 16-bit integers in `a` and `b`, producing
+/// intermediate signed 32-bit integers. Horizontally add adjacent pairs
+/// of intermediate 32-bit integers.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_madd_epi16)
+
+pub fn _mm256_madd_epi16(a: __m256i, b: __m256i) -> __m256i {
+    pmaddwd(BitVec::to_i16x16(a), BitVec::to_i16x16(b)).into()
+}
+
+/// Vertically multiplies each unsigned 8-bit integer from `a` with the
+/// corresponding signed 8-bit integer from `b`, producing intermediate
+/// signed 16-bit integers. Horizontally add adjacent pairs of intermediate
+/// signed 16-bit integers
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_maddubs_epi16)
+
+pub fn _mm256_maddubs_epi16(a: __m256i, b: __m256i) -> __m256i {
+    pmaddubsw(BitVec::to_u8x32(a), BitVec::to_u8x32(b)).into()
+}
+
+/// Compares packed 16-bit integers in `a` and `b`, and returns the packed
+/// maximum values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_max_epi16)
+
+pub fn _mm256_max_epi16(a: __m256i, b: __m256i) -> __m256i {
+    let a = BitVec::to_i16x16(a);
+    let b = BitVec::to_i16x16(b);
+    simd_select::<16, i16, _>(simd_gt(a, b), a, b).into()
+}
+
+/// Compares packed 32-bit integers in `a` and `b`, and returns the packed
+/// maximum values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_max_epi32)
+
+pub fn _mm256_max_epi32(a: __m256i, b: __m256i) -> __m256i {
+    let a = BitVec::to_i32x8(a);
+    let b = BitVec::to_i32x8(b);
+    simd_select::<8, i32, _>(simd_gt(a, b), a, b).into()
+}
+
+/// Compares packed 8-bit integers in `a` and `b`, and returns the packed
+/// maximum values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_max_epi8)
+
+pub fn _mm256_max_epi8(a: __m256i, b: __m256i) -> __m256i {
+    let a = BitVec::to_i8x32(a);
+    let b = BitVec::to_i8x32(b);
+    simd_select::<32, i8, _>(simd_gt(a, b), a, b).into()
+}
+
+/// Compares packed unsigned 16-bit integers in `a` and `b`, and returns
+/// the packed maximum values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_max_epu16)
+
+pub fn _mm256_max_epu16(a: __m256i, b: __m256i) -> __m256i {
+    let a = BitVec::to_u16x16(a);
+    let b = BitVec::to_u16x16(b);
+    simd_select::<16, _, u16>(simd_gt(a, b), a, b).into()
+}
+
+/// Compares packed unsigned 32-bit integers in `a` and `b`, and returns
+/// the packed maximum values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_max_epu32)
+
+pub fn _mm256_max_epu32(a: __m256i, b: __m256i) -> __m256i {
+    let a = BitVec::to_u32x8(a);
+    let b = BitVec::to_u32x8(b);
+    simd_select::<8, _, u32>(simd_gt(a, b), a, b).into()
+}
+
+/// Compares packed unsigned 8-bit integers in `a` and `b`, and returns
+/// the packed maximum values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_max_epu8)
+
+pub fn _mm256_max_epu8(a: __m256i, b: __m256i) -> __m256i {
+    let a = BitVec::to_u8x32(a);
+    let b = BitVec::to_u8x32(b);
+    simd_select::<32, _, u8>(simd_gt(a, b), a, b).into()
+}
+
+/// Compares packed 16-bit integers in `a` and `b`, and returns the packed
+/// minimum values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_min_epi16)
+
+pub fn _mm256_min_epi16(a: __m256i, b: __m256i) -> __m256i {
+    let a = BitVec::to_i16x16(a);
+    let b = BitVec::to_i16x16(b);
+    simd_select::<16, _, i16>(simd_lt(a, b), a, b).into()
+}
+
+/// Compares packed 32-bit integers in `a` and `b`, and returns the packed
+/// minimum values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_min_epi32)
+
+pub fn _mm256_min_epi32(a: __m256i, b: __m256i) -> __m256i {
+    let a = BitVec::to_i32x8(a);
+    let b = BitVec::to_i32x8(b);
+    simd_select::<8, i32, _>(simd_lt(a, b), a, b).into()
+}
+
+/// Compares packed 8-bit integers in `a` and `b`, and returns the packed
+/// minimum values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_min_epi8)
+
+pub fn _mm256_min_epi8(a: __m256i, b: __m256i) -> __m256i {
+    let a = BitVec::to_i8x32(a);
+    let b = BitVec::to_i8x32(b);
+    simd_select::<32, i8, _>(simd_lt(a, b), a, b).into()
+}
+
+/// Compares packed unsigned 16-bit integers in `a` and `b`, and returns
+/// the packed minimum values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_min_epu16)
+
+pub fn _mm256_min_epu16(a: __m256i, b: __m256i) -> __m256i {
+    let a = BitVec::to_u16x16(a);
+    let b = BitVec::to_u16x16(b);
+    simd_select::<16, _, u16>(simd_lt(a, b), a, b).into()
+}
+
+/// Compares packed unsigned 32-bit integers in `a` and `b`, and returns
+/// the packed minimum values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_min_epu32)
+
+pub fn _mm256_min_epu32(a: __m256i, b: __m256i) -> __m256i {
+    let a = BitVec::to_u32x8(a);
+    let b = BitVec::to_u32x8(b);
+    simd_select::<8, _, u32>(simd_lt(a, b), a, b).into()
+}
+
+/// Compares packed unsigned 8-bit integers in `a` and `b`, and returns
+/// the packed minimum values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_min_epu8)
+
+pub fn _mm256_min_epu8(a: __m256i, b: __m256i) -> __m256i {
+    let a = BitVec::to_u8x32(a);
+    let b = BitVec::to_u8x32(b);
+    simd_select::<32, _, u8>(simd_lt(a, b), a, b).into()
+}
+
+/// Creates mask from the most significant bit of each 8-bit element in `a`,
+/// return the result.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_movemask_epi8)
+
+pub fn _mm256_movemask_epi8(a: __m256i) -> i32 {
+    let z = i8x32::from_fn(|_| 0);
+    let m: i8x32 = simd_lt(BitVec::to_i8x32(a), z);
+    let r = simd_bitmask_little!(31, m, u32);
+    r as i32
+}
+
+/// Computes the sum of absolute differences (SADs) of quadruplets of unsigned
+/// 8-bit integers in `a` compared to those in `b`, and stores the 16-bit
+/// results in dst. Eight SADs are performed for each 128-bit lane using one
+/// quadruplet from `b` and eight quadruplets from `a`. One quadruplet is
+/// selected from `b` starting at on the offset specified in `imm8`. Eight
+/// quadruplets are formed from sequential 8-bit integers selected from `a`
+/// starting at the offset specified in `imm8`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_mpsadbw_epu8)
+
+pub fn _mm256_mpsadbw_epu8<const IMM8: i32>(a: __m256i, b: __m256i) -> __m256i {
+    mpsadbw(BitVec::to_u8x32(a), BitVec::to_u8x32(b), IMM8).into()
+}
+
+/// Multiplies the low 32-bit integers from each packed 64-bit element in
+/// `a` and `b`
+///
+/// Returns the 64-bit results.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_mul_epi32)
+
+pub fn _mm256_mul_epi32(a: __m256i, b: __m256i) -> __m256i {
+    let a = simd_cast::<4, _, i64>(simd_cast::<4, _, i32>(BitVec::to_i64x4(a)));
+    let b = simd_cast::<4, _, i64>(simd_cast::<4, _, i32>(BitVec::to_i64x4(b)));
+    simd_mul(a, b).into()
+}
+
+/// Multiplies the low unsigned 32-bit integers from each packed 64-bit
+/// element in `a` and `b`
+///
+/// Returns the unsigned 64-bit results.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_mul_epu32)
+
+pub fn _mm256_mul_epu32(a: __m256i, b: __m256i) -> __m256i {
+    let a = BitVec::to_u64x4(a);
+    let b = BitVec::to_u64x4(b);
+    let mask = u64x4::splat(u32::MAX.into());
+    BitVec::from_u64x4(simd_mul(simd_and(a, mask), simd_and(b, mask)))
+}
+
+/// Multiplies the packed 16-bit integers in `a` and `b`, producing
+/// intermediate 32-bit integers and returning the high 16 bits of the
+/// intermediate integers.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_mulhi_epi16)
+
+pub fn _mm256_mulhi_epi16(a: __m256i, b: __m256i) -> __m256i {
+    let a = simd_cast::<16, _, i32>(BitVec::to_i16x16(a));
+    let b = simd_cast::<16, _, i32>(BitVec::to_i16x16(b));
+    let r = simd_shr(simd_mul(a, b), i32x16::splat(16));
+    simd_cast::<16, i32, i16>(r).into()
+}
+
+/// Multiplies the packed unsigned 16-bit integers in `a` and `b`, producing
+/// intermediate 32-bit integers and returning the high 16 bits of the
+/// intermediate integers.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_mulhi_epu16)
+
+pub fn _mm256_mulhi_epu16(a: __m256i, b: __m256i) -> __m256i {
+    let a = simd_cast::<16, _, u32>(BitVec::to_u16x16(a));
+    let b = simd_cast::<16, _, u32>(BitVec::to_u16x16(b));
+    let r = simd_shr(simd_mul(a, b), u32x16::splat(16));
+    simd_cast::<16, u32, u16>(r).into()
+}
+
+/// Multiplies the packed 16-bit integers in `a` and `b`, producing
+/// intermediate 32-bit integers, and returns the low 16 bits of the
+/// intermediate integers
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_mullo_epi16)
+
+pub fn _mm256_mullo_epi16(a: __m256i, b: __m256i) -> __m256i {
+    simd_mul(BitVec::to_i16x16(a), BitVec::to_i16x16(b)).into()
+}
+
+/// Multiplies the packed 32-bit integers in `a` and `b`, producing
+/// intermediate 64-bit integers, and returns the low 32 bits of the
+/// intermediate integers
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_mullo_epi32)
+
+pub fn _mm256_mullo_epi32(a: __m256i, b: __m256i) -> __m256i {
+    simd_mul(BitVec::to_i32x8(a), BitVec::to_i32x8(b)).into()
+}
+
+/// Multiplies packed 16-bit integers in `a` and `b`, producing
+/// intermediate signed 32-bit integers. Truncate each intermediate
+/// integer to the 18 most significant bits, round by adding 1, and
+/// return bits `[16:1]`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_mulhrs_epi16)
+
+pub fn _mm256_mulhrs_epi16(a: __m256i, b: __m256i) -> __m256i {
+    pmulhrsw(BitVec::to_i16x16(a), BitVec::to_i16x16(b)).into()
+}
+
+/// Computes the bitwise OR of 256 bits (representing integer data) in `a`
+/// and `b`
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_or_si256)
+
+pub fn _mm256_or_si256(a: __m256i, b: __m256i) -> __m256i {
+    simd_or(BitVec::to_i32x8(a), BitVec::to_i32x8(b)).into()
+}
+
+/// Converts packed 16-bit integers from `a` and `b` to packed 8-bit integers
+/// using signed saturation
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_packs_epi16)
+
+pub fn _mm256_packs_epi16(a: __m256i, b: __m256i) -> __m256i {
+    packsswb(BitVec::to_i16x16(a), BitVec::to_i16x16(b)).into()
+}
+
+/// Converts packed 32-bit integers from `a` and `b` to packed 16-bit integers
+/// using signed saturation
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_packs_epi32)
+
+pub fn _mm256_packs_epi32(a: __m256i, b: __m256i) -> __m256i {
+    packssdw(BitVec::to_i32x8(a), BitVec::to_i32x8(b)).into()
+}
+
+/// Converts packed 16-bit integers from `a` and `b` to packed 8-bit integers
+/// using unsigned saturation
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_packus_epi16)
+
+pub fn _mm256_packus_epi16(a: __m256i, b: __m256i) -> __m256i {
+    packuswb(BitVec::to_i16x16(a), BitVec::to_i16x16(b)).into()
+}
+
+/// Converts packed 32-bit integers from `a` and `b` to packed 16-bit integers
+/// using unsigned saturation
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_packus_epi32)
+
+pub fn _mm256_packus_epi32(a: __m256i, b: __m256i) -> __m256i {
+    packusdw(BitVec::to_i32x8(a), BitVec::to_i32x8(b)).into()
+}
+
+/// Permutes packed 32-bit integers from `a` according to the content of `b`.
+///
+/// The last 3 bits of each integer of `b` are used as addresses into the 8
+/// integers of `a`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_permutevar8x32_epi32)
+
+pub fn _mm256_permutevar8x32_epi32(a: __m256i, b: __m256i) -> __m256i {
+    permd(BitVec::to_u32x8(a), BitVec::to_u32x8(b)).into()
+}
+
+/// Permutes 64-bit integers from `a` using control mask `imm8`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_permute4x64_epi64)
+
+pub fn _mm256_permute4x64_epi64<const IMM8: i32>(a: __m256i) -> __m256i {
+    let zero = i64x4::from_fn(|_| 0);
+    let r: i64x4 = simd_shuffle(
+        BitVec::to_i64x4(a),
+        zero,
+        [
+            IMM8 as u64 & 0b11,
+            (IMM8 as u64 >> 2) & 0b11,
+            (IMM8 as u64 >> 4) & 0b11,
+            (IMM8 as u64 >> 6) & 0b11,
+        ],
+    );
+    r.into()
+}
+
+/// Shuffles 128-bits of integer data selected by `imm8` from `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_permute2x128_si256)
+
+pub fn _mm256_permute2x128_si256<const IMM8: i32>(a: __m256i, b: __m256i) -> __m256i {
+    vperm2i128(BitVec::to_i64x4(a), BitVec::to_i64x4(b), IMM8 as i8).into()
+}
+
+/// Computes the absolute differences of packed unsigned 8-bit integers in `a`
+/// and `b`, then horizontally sum each consecutive 8 differences to
+/// produce four unsigned 16-bit integers, and pack these unsigned 16-bit
+/// integers in the low 16 bits of the 64-bit return value
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_sad_epu8)
+
+pub fn _mm256_sad_epu8(a: __m256i, b: __m256i) -> __m256i {
+    psadbw(BitVec::to_u8x32(a), BitVec::to_u8x32(b)).into()
+}
+
+/// Shuffles bytes from `a` according to the content of `b`.
+///
+/// For each of the 128-bit low and high halves of the vectors, the last
+/// 4 bits of each byte of `b` are used as addresses into the respective
+/// low or high 16 bytes of `a`. That is, the halves are shuffled separately.
+///
+/// In addition, if the highest significant bit of a byte of `b` is set, the
+/// respective destination byte is set to 0.
+///
+/// Picturing `a` and `b` as `[u8; 32]`, `_mm256_shuffle_epi8` is logically
+/// equivalent to:
+///
+/// ```
+/// fn mm256_shuffle_epi8(a: [u8; 32], b: [u8; 32]) -> [u8; 32] {
+///     let mut r = [0; 32];
+///     for i in 0..16 {
+///         if b[i] & 0x80 == 0u8 {
+///             r[i] = a[(b[i] % 16) as usize];
+///         }
+///         if b[i + 16] & 0x80 == 0u8 {
+///             r[i + 16] = a[(b[i + 16] % 16 + 16) as usize];
+///         }
+///     }
+///     r
+/// }
+/// ```
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_shuffle_epi8)
+
+pub fn _mm256_shuffle_epi8(a: __m256i, b: __m256i) -> __m256i {
+    pshufb(BitVec::to_u8x32(a), BitVec::to_u8x32(b)).into()
+}
+
+/// Shuffles 32-bit integers in 128-bit lanes of `a` using the control in
+/// `imm8`.
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_shuffle_epi32)
+
+pub fn _mm256_shuffle_epi32<const MASK: i32>(a: __m256i) -> __m256i {
+    let r: i32x8 = simd_shuffle(
+        BitVec::to_i32x8(a),
+        BitVec::to_i32x8(a),
+        [
+            MASK as u64 & 0b11,
+            (MASK as u64 >> 2) & 0b11,
+            (MASK as u64 >> 4) & 0b11,
+            (MASK as u64 >> 6) & 0b11,
+            (MASK as u64 & 0b11) + 4,
+            ((MASK as u64 >> 2) & 0b11) + 4,
+            ((MASK as u64 >> 4) & 0b11) + 4,
+            ((MASK as u64 >> 6) & 0b11) + 4,
+        ],
+    );
+    r.into()
+}
+
+/// Shuffles 16-bit integers in the high 64 bits of 128-bit lanes of `a` using
+/// the control in `imm8`. The low 64 bits of 128-bit lanes of `a` are copied
+/// to the output.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_shufflehi_epi16)
+
+pub fn _mm256_shufflehi_epi16<const IMM8: i32>(a: __m256i) -> __m256i {
+    let a = BitVec::to_i16x16(a);
+    let r: i16x16 = simd_shuffle(
+        a,
+        a,
+        [
+            0,
+            1,
+            2,
+            3,
+            4 + (IMM8 as u64 & 0b11),
+            4 + ((IMM8 as u64 >> 2) & 0b11),
+            4 + ((IMM8 as u64 >> 4) & 0b11),
+            4 + ((IMM8 as u64 >> 6) & 0b11),
+            8,
+            9,
+            10,
+            11,
+            12 + (IMM8 as u64 & 0b11),
+            12 + ((IMM8 as u64 >> 2) & 0b11),
+            12 + ((IMM8 as u64 >> 4) & 0b11),
+            12 + ((IMM8 as u64 >> 6) & 0b11),
+        ],
+    );
+    r.into()
+}
+
+/// Shuffles 16-bit integers in the low 64 bits of 128-bit lanes of `a` using
+/// the control in `imm8`. The high 64 bits of 128-bit lanes of `a` are copied
+/// to the output.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_shufflelo_epi16)
+
+pub fn _mm256_shufflelo_epi16<const IMM8: i32>(a: __m256i) -> __m256i {
+    let a = BitVec::to_i16x16(a);
+    let r: i16x16 = simd_shuffle(
+        a,
+        a,
+        [
+            0 + (IMM8 as u64 & 0b11),
+            0 + ((IMM8 as u64 >> 2) & 0b11),
+            0 + ((IMM8 as u64 >> 4) & 0b11),
+            0 + ((IMM8 as u64 >> 6) & 0b11),
+            4,
+            5,
+            6,
+            7,
+            8 + (IMM8 as u64 & 0b11),
+            8 + ((IMM8 as u64 >> 2) & 0b11),
+            8 + ((IMM8 as u64 >> 4) & 0b11),
+            8 + ((IMM8 as u64 >> 6) & 0b11),
+            12,
+            13,
+            14,
+            15,
+        ],
+    );
+    r.into()
+}
+
+/// Negates packed 16-bit integers in `a` when the corresponding signed
+/// 16-bit integer in `b` is negative, and returns the results.
+/// Results are zeroed out when the corresponding element in `b` is zero.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_sign_epi16)
+
+pub fn _mm256_sign_epi16(a: __m256i, b: __m256i) -> __m256i {
+    psignw(BitVec::to_i16x16(a), BitVec::to_i16x16(b)).into()
+}
+
+/// Negates packed 32-bit integers in `a` when the corresponding signed
+/// 32-bit integer in `b` is negative, and returns the results.
+/// Results are zeroed out when the corresponding element in `b` is zero.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_sign_epi32)
+
+pub fn _mm256_sign_epi32(a: __m256i, b: __m256i) -> __m256i {
+    psignd(BitVec::to_i32x8(a), BitVec::to_i32x8(b)).into()
+}
+
+/// Negates packed 8-bit integers in `a` when the corresponding signed
+/// 8-bit integer in `b` is negative, and returns the results.
+/// Results are zeroed out when the corresponding element in `b` is zero.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_sign_epi8)
+
+pub fn _mm256_sign_epi8(a: __m256i, b: __m256i) -> __m256i {
+    psignb(BitVec::to_i8x32(a), BitVec::to_i8x32(b)).into()
+}
+
+/// Shifts packed 16-bit integers in `a` left by `count` while
+/// shifting in zeros, and returns the result
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_sll_epi16)
+
+pub fn _mm256_sll_epi16(a: __m256i, count: __m128i) -> __m256i {
+    psllw(BitVec::to_i16x16(a), BitVec::to_i16x8(count)).into()
+}
+
+/// Shifts packed 32-bit integers in `a` left by `count` while
+/// shifting in zeros, and returns the result
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_sll_epi32)
+
+pub fn _mm256_sll_epi32(a: __m256i, count: __m128i) -> __m256i {
+    pslld(BitVec::to_i32x8(a), BitVec::to_i32x4(count)).into()
+}
+
+/// Shifts packed 64-bit integers in `a` left by `count` while
+/// shifting in zeros, and returns the result
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_sll_epi64)
+
+pub fn _mm256_sll_epi64(a: __m256i, count: __m128i) -> __m256i {
+    psllq(BitVec::to_i64x4(a), BitVec::to_i64x2(count)).into()
+}
+
+/// Shifts packed 16-bit integers in `a` left by `IMM8` while
+/// shifting in zeros, return the results;
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_slli_epi16)
+
+pub fn _mm256_slli_epi16<const IMM8: i32>(a: __m256i) -> __m256i {
+    if IMM8 >= 16 {
+        _mm256_setzero_si256()
+    } else {
+        simd_shl(BitVec::to_u16x16(a), u16x16::splat(IMM8 as u16)).into()
+    }
+}
+
+/// Shifts packed 32-bit integers in `a` left by `IMM8` while
+/// shifting in zeros, return the results;
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_slli_epi32)
+
+pub fn _mm256_slli_epi32<const IMM8: i32>(a: __m256i) -> __m256i {
+    if IMM8 >= 32 {
+        _mm256_setzero_si256()
+    } else {
+        simd_shl(BitVec::to_u32x8(a), u32x8::splat(IMM8 as u32)).into()
+    }
+}
+
+/// Shifts packed 64-bit integers in `a` left by `IMM8` while
+/// shifting in zeros, return the results;
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_slli_epi64)
+
+pub fn _mm256_slli_epi64<const IMM8: i32>(a: __m256i) -> __m256i {
+    if IMM8 >= 64 {
+        _mm256_setzero_si256()
+    } else {
+        simd_shl(BitVec::to_u64x4(a), u64x4::splat(IMM8 as u64)).into()
+    }
+}
+
+/// Shifts 128-bit lanes in `a` left by `imm8` bytes while shifting in zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_slli_si256)
+
+pub fn _mm256_slli_si256<const IMM8: i32>(a: __m256i) -> __m256i {
+    _mm256_bslli_epi128::<IMM8>(a)
+}
+
+/// Shifts 128-bit lanes in `a` left by `imm8` bytes while shifting in zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_bslli_epi128)
+
+pub fn _mm256_bslli_epi128<const IMM8: i32>(a: __m256i) -> __m256i {
+    const fn mask(shift: i32, i: u32) -> u32 {
+        let shift = shift as u32 & 0xff;
+        if shift > 15 || i % 16 < shift {
+            0
+        } else {
+            32 + (i - shift)
+        }
+    }
+    let a = BitVec::to_i8x32(a);
+    let r: i8x32 = simd_shuffle(
+        i8x32::from_fn(|_| 0),
+        a,
+        [
+            mask(IMM8, 0) as u64,
+            mask(IMM8, 1) as u64,
+            mask(IMM8, 2) as u64,
+            mask(IMM8, 3) as u64,
+            mask(IMM8, 4) as u64,
+            mask(IMM8, 5) as u64,
+            mask(IMM8, 6) as u64,
+            mask(IMM8, 7) as u64,
+            mask(IMM8, 8) as u64,
+            mask(IMM8, 9) as u64,
+            mask(IMM8, 10) as u64,
+            mask(IMM8, 11) as u64,
+            mask(IMM8, 12) as u64,
+            mask(IMM8, 13) as u64,
+            mask(IMM8, 14) as u64,
+            mask(IMM8, 15) as u64,
+            mask(IMM8, 16) as u64,
+            mask(IMM8, 17) as u64,
+            mask(IMM8, 18) as u64,
+            mask(IMM8, 19) as u64,
+            mask(IMM8, 20) as u64,
+            mask(IMM8, 21) as u64,
+            mask(IMM8, 22) as u64,
+            mask(IMM8, 23) as u64,
+            mask(IMM8, 24) as u64,
+            mask(IMM8, 25) as u64,
+            mask(IMM8, 26) as u64,
+            mask(IMM8, 27) as u64,
+            mask(IMM8, 28) as u64,
+            mask(IMM8, 29) as u64,
+            mask(IMM8, 30) as u64,
+            mask(IMM8, 31) as u64,
+        ],
+    );
+    r.into()
+}
+
+/// Shifts packed 32-bit integers in `a` left by the amount
+/// specified by the corresponding element in `count` while
+/// shifting in zeros, and returns the result.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_sllv_epi32)
+
+pub fn _mm_sllv_epi32(a: __m128i, count: __m128i) -> __m128i {
+    psllvd(BitVec::to_i32x4(a), BitVec::to_i32x4(count)).into()
+}
+
+/// Shifts packed 32-bit integers in `a` left by the amount
+/// specified by the corresponding element in `count` while
+/// shifting in zeros, and returns the result.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_sllv_epi32)
+
+pub fn _mm256_sllv_epi32(a: __m256i, count: __m256i) -> __m256i {
+    psllvd256(BitVec::to_i32x8(a), BitVec::to_i32x8(count)).into()
+}
+
+/// Shifts packed 64-bit integers in `a` left by the amount
+/// specified by the corresponding element in `count` while
+/// shifting in zeros, and returns the result.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_sllv_epi64)
+
+pub fn _mm_sllv_epi64(a: __m128i, count: __m128i) -> __m128i {
+    psllvq(BitVec::to_i64x2(a), BitVec::to_i64x2(count)).into()
+}
+
+/// Shifts packed 64-bit integers in `a` left by the amount
+/// specified by the corresponding element in `count` while
+/// shifting in zeros, and returns the result.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_sllv_epi64)
+
+pub fn _mm256_sllv_epi64(a: __m256i, count: __m256i) -> __m256i {
+    psllvq256(BitVec::to_i64x4(a), BitVec::to_i64x4(count)).into()
+}
+
+/// Shifts packed 16-bit integers in `a` right by `count` while
+/// shifting in sign bits.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_sra_epi16)
+
+pub fn _mm256_sra_epi16(a: __m256i, count: __m128i) -> __m256i {
+    psraw(BitVec::to_i16x16(a), BitVec::to_i16x8(count)).into()
+}
+
+/// Shifts packed 32-bit integers in `a` right by `count` while
+/// shifting in sign bits.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_sra_epi32)
+
+pub fn _mm256_sra_epi32(a: __m256i, count: __m128i) -> __m256i {
+    psrad(BitVec::to_i32x8(a), BitVec::to_i32x4(count)).into()
+}
+
+/// Shifts packed 16-bit integers in `a` right by `IMM8` while
+/// shifting in sign bits.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_srai_epi16)
+
+pub fn _mm256_srai_epi16<const IMM8: i32>(a: __m256i) -> __m256i {
+    simd_shr(BitVec::to_i16x16(a), i16x16::splat(IMM8.min(15) as i16)).into()
+}
+
+/// Shifts packed 32-bit integers in `a` right by `IMM8` while
+/// shifting in sign bits.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_srai_epi32)
+
+pub fn _mm256_srai_epi32<const IMM8: i32>(a: __m256i) -> __m256i {
+    simd_shr(BitVec::to_i32x8(a), i32x8::splat(IMM8.min(31))).into()
+}
+
+/// Shifts packed 32-bit integers in `a` right by the amount specified by the
+/// corresponding element in `count` while shifting in sign bits.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_srav_epi32)
+
+pub fn _mm_srav_epi32(a: __m128i, count: __m128i) -> __m128i {
+    psravd(BitVec::to_i32x4(a), BitVec::to_i32x4(count)).into()
+}
+
+/// Shifts packed 32-bit integers in `a` right by the amount specified by the
+/// corresponding element in `count` while shifting in sign bits.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_srav_epi32)
+
+pub fn _mm256_srav_epi32(a: __m256i, count: __m256i) -> __m256i {
+    psravd256(BitVec::to_i32x8(a), BitVec::to_i32x8(count)).into()
+}
+
+/// Shifts 128-bit lanes in `a` right by `imm8` bytes while shifting in zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_srli_si256)
+
+pub fn _mm256_srli_si256<const IMM8: i32>(a: __m256i) -> __m256i {
+    _mm256_bsrli_epi128::<IMM8>(a)
+}
+
+/// Shifts 128-bit lanes in `a` right by `imm8` bytes while shifting in zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_bsrli_epi128)
+
+pub fn _mm256_bsrli_epi128<const IMM8: i32>(a: __m256i) -> __m256i {
+    const fn mask(shift: i32, i: u32) -> u64 {
+        let shift = shift as u32 & 0xff;
+        if shift > 15 || (15 - (i % 16)) < shift {
+            0 as u64
+        } else {
+            (32 + (i + shift)) as u64
+        }
+    }
+
+    let a = BitVec::to_i8x32(a);
+    let r: i8x32 = simd_shuffle(
+        i8x32::from_fn(|_| 0),
+        a,
+        [
+            mask(IMM8, 0),
+            mask(IMM8, 1),
+            mask(IMM8, 2),
+            mask(IMM8, 3),
+            mask(IMM8, 4),
+            mask(IMM8, 5),
+            mask(IMM8, 6),
+            mask(IMM8, 7),
+            mask(IMM8, 8),
+            mask(IMM8, 9),
+            mask(IMM8, 10),
+            mask(IMM8, 11),
+            mask(IMM8, 12),
+            mask(IMM8, 13),
+            mask(IMM8, 14),
+            mask(IMM8, 15),
+            mask(IMM8, 16),
+            mask(IMM8, 17),
+            mask(IMM8, 18),
+            mask(IMM8, 19),
+            mask(IMM8, 20),
+            mask(IMM8, 21),
+            mask(IMM8, 22),
+            mask(IMM8, 23),
+            mask(IMM8, 24),
+            mask(IMM8, 25),
+            mask(IMM8, 26),
+            mask(IMM8, 27),
+            mask(IMM8, 28),
+            mask(IMM8, 29),
+            mask(IMM8, 30),
+            mask(IMM8, 31),
+        ],
+    );
+
+    r.into()
+}
+
+/// Shifts packed 16-bit integers in `a` right by `count` while shifting in
+/// zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_srl_epi16)
+
+pub fn _mm256_srl_epi16(a: __m256i, count: __m128i) -> __m256i {
+    psrlw(BitVec::to_i16x16(a), BitVec::to_i16x8(count)).into()
+}
+
+/// Shifts packed 32-bit integers in `a` right by `count` while shifting in
+/// zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_srl_epi32)
+
+pub fn _mm256_srl_epi32(a: __m256i, count: __m128i) -> __m256i {
+    psrld(BitVec::to_i32x8(a), BitVec::to_i32x4(count)).into()
+}
+
+/// Shifts packed 64-bit integers in `a` right by `count` while shifting in
+/// zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_srl_epi64)
+
+pub fn _mm256_srl_epi64(a: __m256i, count: __m128i) -> __m256i {
+    psrlq(BitVec::to_i64x4(a), BitVec::to_i64x2(count)).into()
+}
+
+/// Shifts packed 16-bit integers in `a` right by `IMM8` while shifting in
+/// zeros
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_srli_epi16)
+
+pub fn _mm256_srli_epi16<const IMM8: i32>(a: __m256i) -> __m256i {
+    if IMM8 >= 16 {
+        _mm256_setzero_si256()
+    } else {
+        simd_shr(BitVec::to_u16x16(a), u16x16::splat(IMM8 as u16)).into()
+    }
+}
+
+/// Shifts packed 32-bit integers in `a` right by `IMM8` while shifting in
+/// zeros
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_srli_epi32)
+
+pub fn _mm256_srli_epi32<const IMM8: i32>(a: __m256i) -> __m256i {
+    if IMM8 >= 32 {
+        _mm256_setzero_si256()
+    } else {
+        simd_shr(BitVec::to_u32x8(a), u32x8::splat(IMM8 as u32)).into()
+    }
+}
+
+/// Shifts packed 64-bit integers in `a` right by `IMM8` while shifting in
+/// zeros
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_srli_epi64)
+
+pub fn _mm256_srli_epi64<const IMM8: i32>(a: __m256i) -> __m256i {
+    if IMM8 >= 64 {
+        _mm256_setzero_si256()
+    } else {
+        simd_shr(BitVec::to_u64x4(a), u64x4::splat(IMM8 as u64)).into()
+    }
+}
+
+/// Shifts packed 32-bit integers in `a` right by the amount specified by
+/// the corresponding element in `count` while shifting in zeros,
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_srlv_epi32)
+
+pub fn _mm_srlv_epi32(a: __m128i, count: __m128i) -> __m128i {
+    psrlvd(BitVec::to_i32x4(a), BitVec::to_i32x4(count)).into()
+}
+
+/// Shifts packed 32-bit integers in `a` right by the amount specified by
+/// the corresponding element in `count` while shifting in zeros,
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_srlv_epi32)
+
+pub fn _mm256_srlv_epi32(a: __m256i, count: __m256i) -> __m256i {
+    psrlvd256(BitVec::to_i32x8(a), BitVec::to_i32x8(count)).into()
+}
+
+/// Shifts packed 64-bit integers in `a` right by the amount specified by
+/// the corresponding element in `count` while shifting in zeros,
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_srlv_epi64)
+
+pub fn _mm_srlv_epi64(a: __m128i, count: __m128i) -> __m128i {
+    psrlvq(BitVec::to_i64x2(a), BitVec::to_i64x2(count)).into()
+}
+
+/// Shifts packed 64-bit integers in `a` right by the amount specified by
+/// the corresponding element in `count` while shifting in zeros,
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_srlv_epi64)
+
+pub fn _mm256_srlv_epi64(a: __m256i, count: __m256i) -> __m256i {
+    psrlvq256(BitVec::to_i64x4(a), BitVec::to_i64x4(count)).into()
+}
+
+/// Subtract packed 16-bit integers in `b` from packed 16-bit integers in `a`
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_sub_epi16)
+
+pub fn _mm256_sub_epi16(a: __m256i, b: __m256i) -> __m256i {
+    simd_sub(BitVec::to_i16x16(a), BitVec::to_i16x16(b)).into()
+}
+
+/// Subtract packed 32-bit integers in `b` from packed 32-bit integers in `a`
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_sub_epi32)
+
+pub fn _mm256_sub_epi32(a: __m256i, b: __m256i) -> __m256i {
+    simd_sub(BitVec::to_i32x8(a), BitVec::to_i32x8(b)).into()
+}
+
+/// Subtract packed 64-bit integers in `b` from packed 64-bit integers in `a`
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_sub_epi64)
+
+pub fn _mm256_sub_epi64(a: __m256i, b: __m256i) -> __m256i {
+    simd_sub(BitVec::to_i64x4(a), BitVec::to_i64x4(b)).into()
+}
+
+/// Subtract packed 8-bit integers in `b` from packed 8-bit integers in `a`
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_sub_epi8)
+
+pub fn _mm256_sub_epi8(a: __m256i, b: __m256i) -> __m256i {
+    simd_sub(BitVec::to_i8x32(a), BitVec::to_i8x32(b)).into()
+}
+
+/// Subtract packed 16-bit integers in `b` from packed 16-bit integers in
+/// `a` using saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_subs_epi16)
+
+pub fn _mm256_subs_epi16(a: __m256i, b: __m256i) -> __m256i {
+    simd_saturating_sub(BitVec::to_i16x16(a), BitVec::to_i16x16(b)).into()
+}
+
+/// Subtract packed 8-bit integers in `b` from packed 8-bit integers in
+/// `a` using saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_subs_epi8)
+
+pub fn _mm256_subs_epi8(a: __m256i, b: __m256i) -> __m256i {
+    simd_saturating_sub(BitVec::to_i8x32(a), BitVec::to_i8x32(b)).into()
+}
+
+/// Subtract packed unsigned 16-bit integers in `b` from packed 16-bit
+/// integers in `a` using saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_subs_epu16)
+
+pub fn _mm256_subs_epu16(a: __m256i, b: __m256i) -> __m256i {
+    simd_saturating_sub(BitVec::to_u16x16(a), BitVec::to_u16x16(b)).into()
+}
+
+/// Subtract packed unsigned 8-bit integers in `b` from packed 8-bit
+/// integers in `a` using saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_subs_epu8)
+
+pub fn _mm256_subs_epu8(a: __m256i, b: __m256i) -> __m256i {
+    simd_saturating_sub(BitVec::to_u8x32(a), BitVec::to_u8x32(b)).into()
+}
+
+/// Unpacks and interleave 8-bit integers from the high half of each
+/// 128-bit lane in `a` and `b`.
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_unpackhi_epi8)
+
+pub fn _mm256_unpackhi_epi8(a: __m256i, b: __m256i) -> __m256i {
+    #[rustfmt::skip]
+    let r: i8x32 = simd_shuffle(BitVec::to_i8x32(a), BitVec::to_i8x32(b), [
+            8, 40, 9, 41, 10, 42, 11, 43,
+            12, 44, 13, 45, 14, 46, 15, 47,
+            24, 56, 25, 57, 26, 58, 27, 59,
+            28, 60, 29, 61, 30, 62, 31, 63,
+    ]);
+    r.into()
+}
+
+/// Unpacks and interleave 8-bit integers from the low half of each
+/// 128-bit lane of `a` and `b`.
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_unpacklo_epi8)
+
+pub fn _mm256_unpacklo_epi8(a: __m256i, b: __m256i) -> __m256i {
+    #[rustfmt::skip]
+    let r: i8x32 = simd_shuffle(BitVec::to_i8x32(a), BitVec::to_i8x32(b), [
+        0, 32, 1, 33, 2, 34, 3, 35,
+        4, 36, 5, 37, 6, 38, 7, 39,
+        16, 48, 17, 49, 18, 50, 19, 51,
+        20, 52, 21, 53, 22, 54, 23, 55,
+    ]);
+    r.into()
+}
+
+/// Unpacks and interleave 16-bit integers from the high half of each
+/// 128-bit lane of `a` and `b`.
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_unpackhi_epi16)
+
+pub fn _mm256_unpackhi_epi16(a: __m256i, b: __m256i) -> __m256i {
+    let r: i16x16 = simd_shuffle(
+        BitVec::to_i16x16(a),
+        BitVec::to_i16x16(b),
+        [4, 20, 5, 21, 6, 22, 7, 23, 12, 28, 13, 29, 14, 30, 15, 31],
+    );
+    r.into()
+}
+
+/// Unpacks and interleave 16-bit integers from the low half of each
+/// 128-bit lane of `a` and `b`.
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_unpacklo_epi16)
+
+pub fn _mm256_unpacklo_epi16(a: __m256i, b: __m256i) -> __m256i {
+    let r: i16x16 = simd_shuffle(
+        BitVec::to_i16x16(a),
+        BitVec::to_i16x16(b),
+        [0, 16, 1, 17, 2, 18, 3, 19, 8, 24, 9, 25, 10, 26, 11, 27],
+    );
+    r.into()
+}
+
+/// Unpacks and interleave 32-bit integers from the high half of each
+/// 128-bit lane of `a` and `b`.
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_unpackhi_epi32)
+
+pub fn _mm256_unpackhi_epi32(a: __m256i, b: __m256i) -> __m256i {
+    let r: i32x8 = simd_shuffle(
+        BitVec::to_i32x8(a),
+        BitVec::to_i32x8(b),
+        [2, 10, 3, 11, 6, 14, 7, 15],
+    );
+    r.into()
+}
+
+/// Unpacks and interleave 32-bit integers from the low half of each
+/// 128-bit lane of `a` and `b`.
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_unpacklo_epi32)
+
+pub fn _mm256_unpacklo_epi32(a: __m256i, b: __m256i) -> __m256i {
+    let r: i32x8 = simd_shuffle(
+        BitVec::to_i32x8(a),
+        BitVec::to_i32x8(b),
+        [0, 8, 1, 9, 4, 12, 5, 13],
+    );
+    r.into()
+}
+
+/// Unpacks and interleave 64-bit integers from the high half of each
+/// 128-bit lane of `a` and `b`.
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_unpackhi_epi64)
+
+pub fn _mm256_unpackhi_epi64(a: __m256i, b: __m256i) -> __m256i {
+    let r: i64x4 = simd_shuffle(BitVec::to_i64x4(a), BitVec::to_i64x4(b), [1, 5, 3, 7]);
+    r.into()
+}
+
+/// Unpacks and interleave 64-bit integers from the low half of each
+/// 128-bit lane of `a` and `b`.
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_unpacklo_epi64)
+
+pub fn _mm256_unpacklo_epi64(a: __m256i, b: __m256i) -> __m256i {
+    let r: i64x4 = simd_shuffle(BitVec::to_i64x4(a), BitVec::to_i64x4(b), [0, 4, 2, 6]);
+    r.into()
+}
+
+/// Computes the bitwise XOR of 256 bits (representing integer data)
+/// in `a` and `b`
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_xor_si256)
+
+pub fn _mm256_xor_si256(a: __m256i, b: __m256i) -> __m256i {
+    simd_xor(BitVec::to_i64x4(a), BitVec::to_i64x4(b)).into()
+}
+
+/// Extracts an 8-bit integer from `a`, selected with `INDEX`. Returns a 32-bit
+/// integer containing the zero-extended integer data.
+///
+/// See [LLVM commit D20468](https://reviews.llvm.org/D20468).
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_extract_epi8)
+
+// This intrinsic has no corresponding instruction.
+
+pub fn _mm256_extract_epi8<const INDEX: i32>(a: __m256i) -> i32 {
+    simd_extract(BitVec::to_u8x32(a), INDEX as u64) as u32 as i32
+}
+
+/// Extracts a 16-bit integer from `a`, selected with `INDEX`. Returns a 32-bit
+/// integer containing the zero-extended integer data.
+///
+/// See [LLVM commit D20468](https://reviews.llvm.org/D20468).
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_extract_epi16)
+
+// This intrinsic has no corresponding instruction.
+
+pub fn _mm256_extract_epi16<const INDEX: i32>(a: __m256i) -> i32 {
+    simd_extract(BitVec::to_u16x16(a), INDEX as u64) as u32 as i32
+}
diff --git a/testable-simd-models/src/core_arch/x86/models/mod.rs b/testable-simd-models/src/core_arch/x86/models/mod.rs
new file mode 100644
index 0000000000000..95c9eb4061b6a
--- /dev/null
+++ b/testable-simd-models/src/core_arch/x86/models/mod.rs
@@ -0,0 +1,37 @@
+//! Rust models for x86 intrinsics.
+//!
+//! This module contains models for the intrinsics as they are defined in the Rust core.
+//! Since this is supposed to model the Rust core, the implemented functions must
+//! mirror the Rust implementations as closely as they can.
+//!
+//! For example, calls to simd functions like simd_add and simd_sub are left as is,
+//! with their implementations defined in `crate::abstractions::simd`. Some other
+//! operations like simd_cast or simd_shuffle might need a little modification
+//! for correct compilation.
+//!
+//! Calls to transmute are replaced with either an explicit call to a `BitVec::from_ function`,
+//! or with `.into()`.
+//!
+//! Sometimes, an intrinsic in Rust is implemented by directly using the corresponding
+//! LLVM instruction via an `unsafe extern "C"` module. In those cases, the corresponding
+//! function is defined in the `c_extern` module in each file, which contain manually
+//! written implementations made by consulting the appropriate Intel documentation.
+//!
+//! In general, it is best to gain an idea of how an implementation should be written by looking
+//! at how other functions are implemented. Also see `core::arch::x86` for [reference](https://github.com/rust-lang/stdarch/tree/master/crates/core_arch).
+
+pub mod avx;
+pub mod avx2;
+pub mod sse2;
+pub mod ssse3;
+
+pub(crate) mod types {
+    use crate::abstractions::bitvec::*;
+
+    #[allow(non_camel_case_types)]
+    pub type __m256i = BitVec<256>;
+    #[allow(non_camel_case_types)]
+    pub type __m256 = BitVec<256>;
+    #[allow(non_camel_case_types)]
+    pub type __m128i = BitVec<128>;
+}
diff --git a/testable-simd-models/src/core_arch/x86/models/sse2.rs b/testable-simd-models/src/core_arch/x86/models/sse2.rs
new file mode 100644
index 0000000000000..ed57f03cfd5d8
--- /dev/null
+++ b/testable-simd-models/src/core_arch/x86/models/sse2.rs
@@ -0,0 +1,1303 @@
+//! Streaming SIMD Extensions 2 (SSE2)
+use super::types::*;
+use crate::abstractions::{bit::Bit, bitvec::BitVec, simd::*};
+mod c_extern {
+    use crate::abstractions::{bit::MachineInteger, simd::*};
+    pub fn packsswb(a: i16x8, b: i16x8) -> i8x16 {
+        i8x16::from_fn(|i| {
+            if i < 8 {
+                if a[i] > (i8::MAX as i16) {
+                    i8::MAX
+                } else if a[i] < (i8::MIN as i16) {
+                    i8::MIN
+                } else {
+                    a[i] as i8
+                }
+            } else {
+                if b[i - 8] > (i8::MAX as i16) {
+                    i8::MAX
+                } else if b[i - 8] < (i8::MIN as i16) {
+                    i8::MIN
+                } else {
+                    b[i - 8] as i8
+                }
+            }
+        })
+    }
+    pub fn pmaddwd(a: i16x8, b: i16x8) -> i32x4 {
+        i32x4::from_fn(|i| {
+            (a[2 * i] as i32) * (b[2 * i] as i32) + (a[2 * i + 1] as i32) * (b[2 * i + 1] as i32)
+        })
+    }
+    pub fn psadbw(a: u8x16, b: u8x16) -> u64x2 {
+        let tmp = u8x16::from_fn(|i| a[i].absolute_diff(b[i]));
+        u64x2::from_fn(|i| {
+            (tmp[i * 8] as u16)
+                .wrapping_add(tmp[i * 8 + 1] as u16)
+                .wrapping_add(tmp[i * 8 + 2] as u16)
+                .wrapping_add(tmp[i * 8 + 3] as u16)
+                .wrapping_add(tmp[i * 8 + 4] as u16)
+                .wrapping_add(tmp[i * 8 + 5] as u16)
+                .wrapping_add(tmp[i * 8 + 6] as u16)
+                .wrapping_add(tmp[i * 8 + 7] as u16) as u64
+        })
+    }
+    pub fn psllw(a: i16x8, count: i16x8) -> i16x8 {
+        let count4: u64 = (count[0] as u16) as u64;
+        let count3: u64 = ((count[1] as u16) as u64) * 65536;
+        let count2: u64 = ((count[2] as u16) as u64) * 4294967296;
+        let count1: u64 = ((count[3] as u16) as u64) * 281474976710656;
+        let count = count1 + count2 + count3 + count4;
+        i16x8::from_fn(|i| {
+            if count > 15 {
+                0
+            } else {
+                ((a[i] as u16) << count) as i16
+            }
+        })
+    }
+
+    pub fn pslld(a: i32x4, count: i32x4) -> i32x4 {
+        let count: u64 = ((count[1] as u32) as u64) * 4294967296 + ((count[0] as u32) as u64);
+
+        i32x4::from_fn(|i| {
+            if count > 31 {
+                0
+            } else {
+                ((a[i] as u32) << count) as i32
+            }
+        })
+    }
+
+    pub fn psllq(a: i64x2, count: i64x2) -> i64x2 {
+        let count: u64 = count[0] as u64;
+
+        i64x2::from_fn(|i| {
+            if count > 63 {
+                0
+            } else {
+                ((a[i] as u64) << count) as i64
+            }
+        })
+    }
+
+    pub fn psraw(a: i16x8, count: i16x8) -> i16x8 {
+        let count: u64 = ((count[3] as u16) as u64) * 281474976710656
+            + ((count[2] as u16) as u64) * 4294967296
+            + ((count[1] as u16) as u64) * 65536
+            + ((count[0] as u16) as u64);
+
+        i16x8::from_fn(|i| {
+            if count > 15 {
+                if a[i] < 0 {
+                    -1
+                } else {
+                    0
+                }
+            } else {
+                a[i] >> count
+            }
+        })
+    }
+
+    pub fn psrad(a: i32x4, count: i32x4) -> i32x4 {
+        let count: u64 = ((count[1] as u32) as u64) * 4294967296 + ((count[0] as u32) as u64);
+
+        i32x4::from_fn(|i| {
+            if count > 31 {
+                if a[i] < 0 {
+                    -1
+                } else {
+                    0
+                }
+            } else {
+                a[i] << count
+            }
+        })
+    }
+
+    pub fn psrlw(a: i16x8, count: i16x8) -> i16x8 {
+        let count: u64 = (count[3] as u16 as u64) * 281474976710656
+            + (count[2] as u16 as u64) * 4294967296
+            + (count[1] as u16 as u64) * 65536
+            + (count[0] as u16 as u64);
+
+        i16x8::from_fn(|i| {
+            if count > 15 {
+                0
+            } else {
+                ((a[i] as u16) >> count) as i16
+            }
+        })
+    }
+
+    pub fn psrld(a: i32x4, count: i32x4) -> i32x4 {
+        let count: u64 = (count[1] as u32 as u64) * 4294967296 + (count[0] as u32 as u64);
+
+        i32x4::from_fn(|i| {
+            if count > 31 {
+                0
+            } else {
+                ((a[i] as u32) >> count) as i32
+            }
+        })
+    }
+
+    pub fn psrlq(a: i64x2, count: i64x2) -> i64x2 {
+        let count: u64 = count[0] as u64;
+
+        i64x2::from_fn(|i| {
+            if count > 63 {
+                0
+            } else {
+                ((a[i] as u64) >> count) as i64
+            }
+        })
+    }
+
+    pub fn packssdw(a: i32x4, b: i32x4) -> i16x8 {
+        i16x8::from_fn(|i| {
+            if i < 4 {
+                if a[i] > (i16::MAX as i32) {
+                    i16::MAX
+                } else if a[i] < (i16::MIN as i32) {
+                    i16::MIN
+                } else {
+                    a[i] as i16
+                }
+            } else {
+                if b[i - 4] > (i16::MAX as i32) {
+                    i16::MAX
+                } else if b[i - 4] < (i16::MIN as i32) {
+                    i16::MIN
+                } else {
+                    b[i - 4] as i16
+                }
+            }
+        })
+    }
+
+    pub fn packuswb(a: i16x8, b: i16x8) -> u8x16 {
+        u8x16::from_fn(|i| {
+            if i < 8 {
+                if a[i] > (u8::MAX as i16) {
+                    u8::MAX
+                } else if a[i] < (u8::MIN as i16) {
+                    u8::MIN
+                } else {
+                    a[i] as u8
+                }
+            } else {
+                if b[i - 8] > (u8::MAX as i16) {
+                    u8::MAX
+                } else if b[i - 8] < (u8::MIN as i16) {
+                    u8::MIN
+                } else {
+                    b[i - 8] as u8
+                }
+            }
+        })
+    }
+}
+
+use c_extern::*;
+
+/// Adds packed 8-bit integers in `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_add_epi8)
+
+pub fn _mm_add_epi8(a: __m128i, b: __m128i) -> __m128i {
+    simd_add(BitVec::to_i8x16(a), BitVec::to_i8x16(b)).into()
+}
+
+/// Adds packed 16-bit integers in `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_add_epi16)
+
+pub fn _mm_add_epi16(a: __m128i, b: __m128i) -> __m128i {
+    BitVec::from_i16x8(simd_add(BitVec::to_i16x8(a), BitVec::to_i16x8(b)))
+}
+
+/// Adds packed 32-bit integers in `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_add_epi32)
+
+pub fn _mm_add_epi32(a: __m128i, b: __m128i) -> __m128i {
+    simd_add(BitVec::to_i32x4(a), BitVec::to_i32x4(b)).into()
+}
+
+/// Adds packed 64-bit integers in `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_add_epi64)
+
+pub fn _mm_add_epi64(a: __m128i, b: __m128i) -> __m128i {
+    simd_add(BitVec::to_i64x2(a), BitVec::to_i64x2(b)).into()
+}
+
+/// Adds packed 8-bit integers in `a` and `b` using saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_adds_epi8)
+
+pub fn _mm_adds_epi8(a: __m128i, b: __m128i) -> __m128i {
+    simd_saturating_add(BitVec::to_i8x16(a), BitVec::to_i8x16(b)).into()
+}
+
+/// Adds packed 16-bit integers in `a` and `b` using saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_adds_epi16)
+
+pub fn _mm_adds_epi16(a: __m128i, b: __m128i) -> __m128i {
+    simd_saturating_add(BitVec::to_i16x8(a), BitVec::to_i16x8(b)).into()
+}
+
+/// Adds packed unsigned 8-bit integers in `a` and `b` using saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_adds_epu8)
+
+pub fn _mm_adds_epu8(a: __m128i, b: __m128i) -> __m128i {
+    simd_saturating_add(BitVec::to_u8x16(a), BitVec::to_u8x16(b)).into()
+}
+
+/// Adds packed unsigned 16-bit integers in `a` and `b` using saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_adds_epu16)
+
+pub fn _mm_adds_epu16(a: __m128i, b: __m128i) -> __m128i {
+    simd_saturating_add(BitVec::to_u16x8(a), BitVec::to_u16x8(b)).into()
+}
+
+/// Averages packed unsigned 8-bit integers in `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_avg_epu8)
+
+pub fn _mm_avg_epu8(a: __m128i, b: __m128i) -> __m128i {
+    let a = simd_cast::<16, _, u16>(BitVec::to_u8x16(a));
+    let b = simd_cast::<16, _, u16>(BitVec::to_u8x16(b));
+    let r = simd_shr(simd_add(simd_add(a, b), u16x16::splat(1)), u16x16::splat(1));
+    simd_cast::<16, _, u8>(r).into()
+}
+
+/// Averages packed unsigned 16-bit integers in `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_avg_epu16)
+
+pub fn _mm_avg_epu16(a: __m128i, b: __m128i) -> __m128i {
+    let a = simd_cast::<8, _, u32>(BitVec::to_u16x8(a));
+    let b = simd_cast::<8, _, u32>(BitVec::to_u16x8(b));
+    let r = simd_shr(simd_add(simd_add(a, b), u32x8::splat(1)), u32x8::splat(1));
+    simd_cast::<8, _, u16>(r).into()
+}
+
+/// Multiplies and then horizontally add signed 16 bit integers in `a` and `b`.
+///
+/// Multiplies packed signed 16-bit integers in `a` and `b`, producing
+/// intermediate signed 32-bit integers. Horizontally add adjacent pairs of
+/// intermediate 32-bit integers.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_madd_epi16)
+
+pub fn _mm_madd_epi16(a: __m128i, b: __m128i) -> __m128i {
+    pmaddwd(BitVec::to_i16x8(a), BitVec::to_i16x8(b)).into()
+}
+
+/// Compares packed 16-bit integers in `a` and `b`, and returns the packed
+/// maximum values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_max_epi16)
+
+pub fn _mm_max_epi16(a: __m128i, b: __m128i) -> __m128i {
+    let a = BitVec::to_i16x8(a);
+    let b = BitVec::to_i16x8(b);
+    simd_select(simd_gt(a, b), a, b).into()
+}
+
+/// Compares packed unsigned 8-bit integers in `a` and `b`, and returns the
+/// packed maximum values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_max_epu8)
+
+pub fn _mm_max_epu8(a: __m128i, b: __m128i) -> __m128i {
+    let a = BitVec::to_u8x16(a);
+    let b = BitVec::to_u8x16(b);
+    simd_select(simd_gt(a, b), a, b).into()
+}
+
+/// Compares packed 16-bit integers in `a` and `b`, and returns the packed
+/// minimum values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_min_epi16)
+
+pub fn _mm_min_epi16(a: __m128i, b: __m128i) -> __m128i {
+    let a = BitVec::to_i16x8(a);
+    let b = BitVec::to_i16x8(b);
+    simd_select(simd_lt(a, b), a, b).into()
+}
+
+/// Compares packed unsigned 8-bit integers in `a` and `b`, and returns the
+/// packed minimum values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_min_epu8)
+
+pub fn _mm_min_epu8(a: __m128i, b: __m128i) -> __m128i {
+    let a = BitVec::to_u8x16(a);
+    let b = BitVec::to_u8x16(b);
+    simd_select(simd_lt(a, b), a, b).into()
+}
+
+/// Multiplies the packed 16-bit integers in `a` and `b`.
+///
+/// The multiplication produces intermediate 32-bit integers, and returns the
+/// high 16 bits of the intermediate integers.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_mulhi_epi16)
+
+pub fn _mm_mulhi_epi16(a: __m128i, b: __m128i) -> __m128i {
+    let a = simd_cast::<8, i16, i32>(BitVec::to_i16x8(a));
+    let b = simd_cast::<8, i16, i32>(BitVec::to_i16x8(b));
+    let r = simd_shr(simd_mul(a, b), i32x8::splat(16));
+    BitVec::from_i16x8(simd_cast::<8, i32, i16>(r))
+}
+
+/// Multiplies the packed unsigned 16-bit integers in `a` and `b`.
+///
+/// The multiplication produces intermediate 32-bit integers, and returns the
+/// high 16 bits of the intermediate integers.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_mulhi_epu16)
+
+pub fn _mm_mulhi_epu16(a: __m128i, b: __m128i) -> __m128i {
+    let a = simd_cast::<8, _, u32>(BitVec::to_u16x8(a));
+    let b = simd_cast::<8, _, u32>(BitVec::to_u16x8(b));
+    let r = simd_shr(simd_mul(a, b), u32x8::splat(16));
+    simd_cast::<8, u32, u16>(r).into()
+}
+
+/// Multiplies the packed 16-bit integers in `a` and `b`.
+///
+/// The multiplication produces intermediate 32-bit integers, and returns the
+/// low 16 bits of the intermediate integers.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_mullo_epi16)
+
+pub fn _mm_mullo_epi16(a: __m128i, b: __m128i) -> __m128i {
+    BitVec::from_i16x8(simd_mul(BitVec::to_i16x8(a), BitVec::to_i16x8(b)))
+}
+
+/// Multiplies the low unsigned 32-bit integers from each packed 64-bit element
+/// in `a` and `b`.
+///
+/// Returns the unsigned 64-bit results.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_mul_epu32)
+
+pub fn _mm_mul_epu32(a: __m128i, b: __m128i) -> __m128i {
+    let a = BitVec::to_u64x2(a);
+    let b = BitVec::to_u64x2(b);
+    let mask = u64x2::splat(u32::MAX.into());
+    simd_mul(simd_and(a, mask), simd_and(b, mask)).into()
+}
+
+/// Sum the absolute differences of packed unsigned 8-bit integers.
+///
+/// Computes the absolute differences of packed unsigned 8-bit integers in `a`
+/// and `b`, then horizontally sum each consecutive 8 differences to produce
+/// two unsigned 16-bit integers, and pack these unsigned 16-bit integers in
+/// the low 16 bits of 64-bit elements returned.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_sad_epu8)
+
+pub fn _mm_sad_epu8(a: __m128i, b: __m128i) -> __m128i {
+    psadbw(BitVec::to_u8x16(a), BitVec::to_u8x16(b)).into()
+}
+
+/// Subtracts packed 8-bit integers in `b` from packed 8-bit integers in `a`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_sub_epi8)
+
+pub fn _mm_sub_epi8(a: __m128i, b: __m128i) -> __m128i {
+    BitVec::from_i8x16(simd_sub(BitVec::to_i8x16(a), BitVec::to_i8x16(b)))
+}
+
+/// Subtracts packed 16-bit integers in `b` from packed 16-bit integers in `a`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_sub_epi16)
+
+pub fn _mm_sub_epi16(a: __m128i, b: __m128i) -> __m128i {
+    BitVec::from_i16x8(simd_sub(BitVec::to_i16x8(a), BitVec::to_i16x8(b)))
+}
+
+/// Subtract packed 32-bit integers in `b` from packed 32-bit integers in `a`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_sub_epi32)
+
+pub fn _mm_sub_epi32(a: __m128i, b: __m128i) -> __m128i {
+    simd_sub(BitVec::to_i32x4(a), BitVec::to_i32x4(b)).into()
+}
+
+/// Subtract packed 64-bit integers in `b` from packed 64-bit integers in `a`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_sub_epi64)
+
+pub fn _mm_sub_epi64(a: __m128i, b: __m128i) -> __m128i {
+    simd_sub(BitVec::to_i64x2(a), BitVec::to_i64x2(b)).into()
+}
+
+/// Subtract packed 8-bit integers in `b` from packed 8-bit integers in `a`
+/// using saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_subs_epi8)
+
+pub fn _mm_subs_epi8(a: __m128i, b: __m128i) -> __m128i {
+    simd_saturating_sub(BitVec::to_i8x16(a), BitVec::to_i8x16(b)).into()
+}
+
+/// Subtract packed 16-bit integers in `b` from packed 16-bit integers in `a`
+/// using saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_subs_epi16)
+
+pub fn _mm_subs_epi16(a: __m128i, b: __m128i) -> __m128i {
+    simd_saturating_sub(BitVec::to_i16x8(a), BitVec::to_i16x8(b)).into()
+}
+
+/// Subtract packed unsigned 8-bit integers in `b` from packed unsigned 8-bit
+/// integers in `a` using saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_subs_epu8)
+
+pub fn _mm_subs_epu8(a: __m128i, b: __m128i) -> __m128i {
+    simd_saturating_sub(BitVec::to_u8x16(a), BitVec::to_u8x16(b)).into()
+}
+
+/// Subtract packed unsigned 16-bit integers in `b` from packed unsigned 16-bit
+/// integers in `a` using saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_subs_epu16)
+
+pub fn _mm_subs_epu16(a: __m128i, b: __m128i) -> __m128i {
+    simd_saturating_sub(BitVec::to_u16x8(a), BitVec::to_u16x8(b)).into()
+}
+
+/// Shifts `a` left by `IMM8` bytes while shifting in zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_slli_si128)
+
+pub fn _mm_slli_si128<const IMM8: i32>(a: __m128i) -> __m128i {
+    // static_assert_uimm_bits!(IMM8, 8);
+    _mm_slli_si128_impl::<IMM8>(a)
+}
+
+/// Implementation detail: converts the immediate argument of the
+/// `_mm_slli_si128` intrinsic into a compile-time constant.
+
+fn _mm_slli_si128_impl<const IMM8: i32>(a: __m128i) -> __m128i {
+    const fn mask(shift: i32, i: u32) -> u64 {
+        let shift = shift as u32 & 0xff;
+        if shift > 15 {
+            i as u64
+        } else {
+            (16 - shift + i) as u64
+        }
+    }
+    (simd_shuffle(
+        i8x16::from_fn(|_| 0),
+        BitVec::to_i8x16(a),
+        [
+            mask(IMM8, 0),
+            mask(IMM8, 1),
+            mask(IMM8, 2),
+            mask(IMM8, 3),
+            mask(IMM8, 4),
+            mask(IMM8, 5),
+            mask(IMM8, 6),
+            mask(IMM8, 7),
+            mask(IMM8, 8),
+            mask(IMM8, 9),
+            mask(IMM8, 10),
+            mask(IMM8, 11),
+            mask(IMM8, 12),
+            mask(IMM8, 13),
+            mask(IMM8, 14),
+            mask(IMM8, 15),
+        ],
+    ))
+    .into()
+}
+
+/// Shifts `a` left by `IMM8` bytes while shifting in zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_bslli_si128)
+
+pub fn _mm_bslli_si128<const IMM8: i32>(a: __m128i) -> __m128i {
+    // static_assert_uimm_bits!(IMM8, 8);
+    _mm_slli_si128_impl::<IMM8>(a)
+}
+
+/// Shifts `a` right by `IMM8` bytes while shifting in zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_bsrli_si128)
+
+pub fn _mm_bsrli_si128<const IMM8: i32>(a: __m128i) -> __m128i {
+    // static_assert_uimm_bits!(IMM8, 8);
+    _mm_srli_si128_impl::<IMM8>(a)
+}
+
+/// Shifts packed 16-bit integers in `a` left by `IMM8` while shifting in zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_slli_epi16)
+
+pub fn _mm_slli_epi16<const IMM8: i32>(a: __m128i) -> __m128i {
+    // static_assert_uimm_bits!(IMM8, 8);
+
+    if IMM8 >= 16 {
+        _mm_setzero_si128()
+    } else {
+        simd_shl(BitVec::to_u16x8(a), u16x8::splat(IMM8 as u16)).into()
+    }
+}
+
+/// Shifts packed 16-bit integers in `a` left by `count` while shifting in
+/// zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_sll_epi16)
+
+pub fn _mm_sll_epi16(a: __m128i, count: __m128i) -> __m128i {
+    psllw(BitVec::to_i16x8(a), BitVec::to_i16x8(count)).into()
+}
+
+/// Shifts packed 32-bit integers in `a` left by `IMM8` while shifting in zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_slli_epi32)
+
+pub fn _mm_slli_epi32<const IMM8: i32>(a: __m128i) -> __m128i {
+    // static_assert_uimm_bits!(IMM8, 8);
+
+    if IMM8 >= 32 {
+        _mm_setzero_si128()
+    } else {
+        simd_shl(BitVec::to_u32x4(a), u32x4::splat(IMM8 as u32)).into()
+    }
+}
+
+/// Shifts packed 32-bit integers in `a` left by `count` while shifting in
+/// zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_sll_epi32)
+
+pub fn _mm_sll_epi32(a: __m128i, count: __m128i) -> __m128i {
+    pslld(BitVec::to_i32x4(a), BitVec::to_i32x4(count)).into()
+}
+
+/// Shifts packed 64-bit integers in `a` left by `IMM8` while shifting in zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_slli_epi64)
+
+pub fn _mm_slli_epi64<const IMM8: i32>(a: __m128i) -> __m128i {
+    // static_assert_uimm_bits!(IMM8, 8);
+
+    if IMM8 >= 64 {
+        _mm_setzero_si128()
+    } else {
+        simd_shl(BitVec::to_u64x2(a), u64x2::splat(IMM8 as u64)).into()
+    }
+}
+
+/// Shifts packed 64-bit integers in `a` left by `count` while shifting in
+/// zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_sll_epi64)
+
+pub fn _mm_sll_epi64(a: __m128i, count: __m128i) -> __m128i {
+    psllq(BitVec::to_i64x2(a), BitVec::to_i64x2(count)).into()
+}
+
+/// Shifts packed 16-bit integers in `a` right by `IMM8` while shifting in sign
+/// bits.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_srai_epi16)
+
+pub fn _mm_srai_epi16<const IMM8: i32>(a: __m128i) -> __m128i {
+    // static_assert_uimm_bits!(IMM8, 8);
+    simd_shr(BitVec::to_i16x8(a), i16x8::splat(IMM8.min(15) as i16)).into()
+}
+
+/// Shifts packed 16-bit integers in `a` right by `count` while shifting in sign
+/// bits.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_sra_epi16)
+
+pub fn _mm_sra_epi16(a: __m128i, count: __m128i) -> __m128i {
+    psraw(BitVec::to_i16x8(a), BitVec::to_i16x8(count)).into()
+}
+
+/// Shifts packed 32-bit integers in `a` right by `IMM8` while shifting in sign
+/// bits.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_srai_epi32)
+
+pub fn _mm_srai_epi32<const IMM8: i32>(a: __m128i) -> __m128i {
+    // static_assert_uimm_bits!(IMM8, 8);
+    simd_shr(BitVec::to_i32x4(a), i32x4::splat(IMM8.min(31))).into()
+}
+
+/// Shifts packed 32-bit integers in `a` right by `count` while shifting in sign
+/// bits.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_sra_epi32)
+
+pub fn _mm_sra_epi32(a: __m128i, count: __m128i) -> __m128i {
+    psrad(BitVec::to_i32x4(a), BitVec::to_i32x4(count)).into()
+}
+
+/// Shifts `a` right by `IMM8` bytes while shifting in zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_srli_si128)
+
+pub fn _mm_srli_si128<const IMM8: i32>(a: __m128i) -> __m128i {
+    // static_assert_uimm_bits!(IMM8, 8);
+    _mm_srli_si128_impl::<IMM8>(a)
+}
+
+/// Implementation detail: converts the immediate argument of the
+/// `_mm_srli_si128` intrinsic into a compile-time constant.
+
+fn _mm_srli_si128_impl<const IMM8: i32>(a: __m128i) -> __m128i {
+    const fn mask(shift: i32, i: u32) -> u64 {
+        if (shift as u32) > 15 {
+            (i + 16) as u64
+        } else {
+            (i + (shift as u32)) as u64
+        }
+    }
+    let x: i8x16 = simd_shuffle(
+        BitVec::to_i8x16(a),
+        i8x16::from_fn(|_| 0),
+        [
+            mask(IMM8, 0),
+            mask(IMM8, 1),
+            mask(IMM8, 2),
+            mask(IMM8, 3),
+            mask(IMM8, 4),
+            mask(IMM8, 5),
+            mask(IMM8, 6),
+            mask(IMM8, 7),
+            mask(IMM8, 8),
+            mask(IMM8, 9),
+            mask(IMM8, 10),
+            mask(IMM8, 11),
+            mask(IMM8, 12),
+            mask(IMM8, 13),
+            mask(IMM8, 14),
+            mask(IMM8, 15),
+        ],
+    );
+    x.into()
+}
+
+/// Shifts packed 16-bit integers in `a` right by `IMM8` while shifting in
+/// zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_srli_epi16)
+
+pub fn _mm_srli_epi16<const IMM8: i32>(a: __m128i) -> __m128i {
+    // static_assert_uimm_bits!(IMM8, 8);
+
+    if IMM8 >= 16 {
+        _mm_setzero_si128()
+    } else {
+        simd_shr(BitVec::to_u16x8(a), u16x8::splat(IMM8 as u16)).into()
+    }
+}
+
+/// Shifts packed 16-bit integers in `a` right by `count` while shifting in
+/// zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_srl_epi16)
+
+pub fn _mm_srl_epi16(a: __m128i, count: __m128i) -> __m128i {
+    psrlw(BitVec::to_i16x8(a), BitVec::to_i16x8(count)).into()
+}
+
+/// Shifts packed 32-bit integers in `a` right by `IMM8` while shifting in
+/// zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_srli_epi32)
+
+pub fn _mm_srli_epi32<const IMM8: i32>(a: __m128i) -> __m128i {
+    // static_assert_uimm_bits!(IMM8, 8);
+
+    if IMM8 >= 32 {
+        _mm_setzero_si128()
+    } else {
+        simd_shr(BitVec::to_u32x4(a), u32x4::splat(IMM8 as u32)).into()
+    }
+}
+
+/// Shifts packed 32-bit integers in `a` right by `count` while shifting in
+/// zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_srl_epi32)
+
+pub fn _mm_srl_epi32(a: __m128i, count: __m128i) -> __m128i {
+    psrld(BitVec::to_i32x4(a), BitVec::to_i32x4(count)).into()
+}
+
+/// Shifts packed 64-bit integers in `a` right by `IMM8` while shifting in
+/// zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_srli_epi64)
+
+pub fn _mm_srli_epi64<const IMM8: i32>(a: __m128i) -> __m128i {
+    // TODO    // static_assert_uimm_bits!(IMM8, 8);
+
+    if IMM8 >= 64 {
+        BitVec::from_fn(|_| Bit::Zero)
+    } else {
+        BitVec::from_u64x2(simd_shr(BitVec::to_u64x2(a), u64x2::splat(IMM8 as u64)))
+    }
+}
+
+/// Shifts packed 64-bit integers in `a` right by `count` while shifting in
+/// zeros.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_srl_epi64)
+
+pub fn _mm_srl_epi64(a: __m128i, count: __m128i) -> __m128i {
+    psrlq(BitVec::to_i64x2(a), BitVec::to_i64x2(count)).into()
+}
+
+/// Computes the bitwise AND of 128 bits (representing integer data) in `a` and
+/// `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_and_si128)
+
+pub fn _mm_and_si128(a: __m128i, b: __m128i) -> __m128i {
+    BitVec::from_fn(|i| a[i] & b[i])
+}
+
+/// Computes the bitwise NOT of 128 bits (representing integer data) in `a` and
+/// then AND with `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_andnot_si128)
+
+pub fn _mm_andnot_si128(a: __m128i, b: __m128i) -> __m128i {
+    BitVec::from_fn(|i| BitVec::<128>::from_fn(|i| _mm_set1_epi8(-1)[i] ^ a[i])[i] & b[i])
+}
+
+/// Computes the bitwise OR of 128 bits (representing integer data) in `a` and
+/// `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_or_si128)
+
+pub fn _mm_or_si128(a: __m128i, b: __m128i) -> __m128i {
+    BitVec::from_fn(|i| a[i] | b[i])
+}
+
+/// Computes the bitwise XOR of 128 bits (representing integer data) in `a` and
+/// `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_xor_si128)
+
+pub fn _mm_xor_si128(a: __m128i, b: __m128i) -> __m128i {
+    BitVec::from_fn(|i| a[i] ^ b[i])
+}
+
+/// Compares packed 8-bit integers in `a` and `b` for equality.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_cmpeq_epi8)
+
+pub fn _mm_cmpeq_epi8(a: __m128i, b: __m128i) -> __m128i {
+    (simd_eq(BitVec::to_i8x16(a), BitVec::to_i8x16(b))).into()
+}
+
+/// Compares packed 16-bit integers in `a` and `b` for equality.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_cmpeq_epi16)
+
+pub fn _mm_cmpeq_epi16(a: __m128i, b: __m128i) -> __m128i {
+    (simd_eq(BitVec::to_i16x8(a), BitVec::to_i16x8(b))).into()
+}
+
+/// Compares packed 32-bit integers in `a` and `b` for equality.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_cmpeq_epi32)
+
+pub fn _mm_cmpeq_epi32(a: __m128i, b: __m128i) -> __m128i {
+    (simd_eq(BitVec::to_i32x4(a), BitVec::to_i32x4(b))).into()
+}
+
+/// Compares packed 8-bit integers in `a` and `b` for greater-than.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_cmpgt_epi8)
+
+pub fn _mm_cmpgt_epi8(a: __m128i, b: __m128i) -> __m128i {
+    (simd_gt(BitVec::to_i8x16(a), BitVec::to_i8x16(b))).into()
+}
+
+/// Compares packed 16-bit integers in `a` and `b` for greater-than.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_cmpgt_epi16)
+
+pub fn _mm_cmpgt_epi16(a: __m128i, b: __m128i) -> __m128i {
+    (simd_gt(BitVec::to_i16x8(a), BitVec::to_i16x8(b))).into()
+}
+
+/// Compares packed 32-bit integers in `a` and `b` for greater-than.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_cmpgt_epi32)
+
+pub fn _mm_cmpgt_epi32(a: __m128i, b: __m128i) -> __m128i {
+    (simd_gt(BitVec::to_i32x4(a), BitVec::to_i32x4(b))).into()
+}
+
+/// Compares packed 8-bit integers in `a` and `b` for less-than.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_cmplt_epi8)
+
+pub fn _mm_cmplt_epi8(a: __m128i, b: __m128i) -> __m128i {
+    (simd_lt(BitVec::to_i8x16(a), BitVec::to_i8x16(b))).into()
+}
+
+/// Compares packed 16-bit integers in `a` and `b` for less-than.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_cmplt_epi16)
+
+pub fn _mm_cmplt_epi16(a: __m128i, b: __m128i) -> __m128i {
+    (simd_lt(BitVec::to_i16x8(a), BitVec::to_i16x8(b))).into()
+}
+
+/// Compares packed 32-bit integers in `a` and `b` for less-than.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_cmplt_epi32)
+
+pub fn _mm_cmplt_epi32(a: __m128i, b: __m128i) -> __m128i {
+    (simd_lt(BitVec::to_i32x4(a), BitVec::to_i32x4(b))).into()
+}
+
+pub fn _mm_cvtsi32_si128(a: i32) -> __m128i {
+    i32x4::from_fn(|i| if i == 0 { a } else { 0 }).into()
+}
+
+/// Returns the lowest element of `a`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_cvtsi128_si32)
+
+pub fn _mm_cvtsi128_si32(a: __m128i) -> i32 {
+    simd_extract(BitVec::to_i32x4(a), 0)
+}
+
+/// Sets packed 64-bit integers with the supplied values, from highest to
+/// lowest.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_set_epi64x)
+
+// no particular instruction to test
+
+pub fn _mm_set_epi64x(e1: i64, e0: i64) -> __m128i {
+    i64x2::from_fn(|i| if i == 0 { e0 } else { e1 }).into()
+}
+
+/// Sets packed 32-bit integers with the supplied values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_set_epi32)
+// no particular instruction to test
+pub fn _mm_set_epi32(e3: i32, e2: i32, e1: i32, e0: i32) -> __m128i {
+    let vec = [e0, e1, e2, e3];
+    BitVec::from_i32x4(i32x4::from_fn(|i| vec[i as usize]))
+}
+
+/// Sets packed 16-bit integers with the supplied values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_set_epi16)
+
+// no particular instruction to test
+
+pub fn _mm_set_epi16(
+    e7: i16,
+    e6: i16,
+    e5: i16,
+    e4: i16,
+    e3: i16,
+    e2: i16,
+    e1: i16,
+    e0: i16,
+) -> __m128i {
+    let vec = [e0, e1, e2, e3, e4, e5, e6, e7];
+    BitVec::from_i16x8(i16x8::from_fn(|i| vec[i as usize]))
+}
+
+/// Sets packed 8-bit integers with the supplied values.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_set_epi8)
+// no particular instruction to test
+pub fn _mm_set_epi8(
+    e15: i8,
+    e14: i8,
+    e13: i8,
+    e12: i8,
+    e11: i8,
+    e10: i8,
+    e9: i8,
+    e8: i8,
+    e7: i8,
+    e6: i8,
+    e5: i8,
+    e4: i8,
+    e3: i8,
+    e2: i8,
+    e1: i8,
+    e0: i8,
+) -> __m128i {
+    let vec = [
+        e0, e1, e2, e3, e4, e5, e6, e7, e8, e9, e10, e11, e12, e13, e14, e15,
+    ];
+    BitVec::from_i8x16(i8x16::from_fn(|i| vec[i as usize]))
+}
+
+/// Broadcasts 64-bit integer `a` to all elements.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_set1_epi64x)
+
+// no particular instruction to test
+
+pub fn _mm_set1_epi64x(a: i64) -> __m128i {
+    _mm_set_epi64x(a, a)
+}
+
+/// Broadcasts 32-bit integer `a` to all elements.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_set1_epi32)
+
+// no particular instruction to test
+
+pub fn _mm_set1_epi32(a: i32) -> __m128i {
+    _mm_set_epi32(a, a, a, a)
+}
+
+/// Broadcasts 16-bit integer `a` to all elements.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_set1_epi16)
+
+// no particular instruction to test
+
+pub fn _mm_set1_epi16(a: i16) -> __m128i {
+    BitVec::from_i16x8(i16x8::from_fn(|_| a))
+}
+
+/// Broadcasts 8-bit integer `a` to all elements.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_set1_epi8)
+
+// no particular instruction to test
+
+pub fn _mm_set1_epi8(a: i8) -> __m128i {
+    _mm_set_epi8(a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a)
+}
+
+/// Sets packed 32-bit integers with the supplied values in reverse order.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_setr_epi32)
+
+// no particular instruction to test
+
+pub fn _mm_setr_epi32(e3: i32, e2: i32, e1: i32, e0: i32) -> __m128i {
+    _mm_set_epi32(e0, e1, e2, e3)
+}
+
+/// Sets packed 16-bit integers with the supplied values in reverse order.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_setr_epi16)
+
+// no particular instruction to test
+
+pub fn _mm_setr_epi16(
+    e7: i16,
+    e6: i16,
+    e5: i16,
+    e4: i16,
+    e3: i16,
+    e2: i16,
+    e1: i16,
+    e0: i16,
+) -> __m128i {
+    _mm_set_epi16(e0, e1, e2, e3, e4, e5, e6, e7)
+}
+
+/// Sets packed 8-bit integers with the supplied values in reverse order.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_setr_epi8)
+
+// no particular instruction to test
+
+pub fn _mm_setr_epi8(
+    e15: i8,
+    e14: i8,
+    e13: i8,
+    e12: i8,
+    e11: i8,
+    e10: i8,
+    e9: i8,
+    e8: i8,
+    e7: i8,
+    e6: i8,
+    e5: i8,
+    e4: i8,
+    e3: i8,
+    e2: i8,
+    e1: i8,
+    e0: i8,
+) -> __m128i {
+    _mm_set_epi8(
+        e0, e1, e2, e3, e4, e5, e6, e7, e8, e9, e10, e11, e12, e13, e14, e15,
+    )
+}
+
+/// Returns a vector with all elements set to zero.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_setzero_si128)
+
+pub fn _mm_setzero_si128() -> __m128i {
+    BitVec::from_fn(|_| Bit::Zero)
+}
+
+/// Returns a vector where the low element is extracted from `a` and its upper
+/// element is zero.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_move_epi64)
+
+// FIXME movd on msvc, movd on i686
+
+pub fn _mm_move_epi64(a: __m128i) -> __m128i {
+    let r: i64x2 = simd_shuffle(BitVec::to_i64x2(a), i64x2::from_fn(|_| 0), [0, 2]);
+    r.into()
+}
+
+/// Converts packed 16-bit integers from `a` and `b` to packed 8-bit integers
+/// using signed saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_packs_epi16)
+
+pub fn _mm_packs_epi16(a: __m128i, b: __m128i) -> __m128i {
+    packsswb(BitVec::to_i16x8(a), BitVec::to_i16x8(b)).into()
+}
+
+/// Converts packed 32-bit integers from `a` and `b` to packed 16-bit integers
+/// using signed saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_packs_epi32)
+
+pub fn _mm_packs_epi32(a: __m128i, b: __m128i) -> __m128i {
+    packssdw(BitVec::to_i32x4(a), BitVec::to_i32x4(b)).into()
+}
+
+/// Converts packed 16-bit integers from `a` and `b` to packed 8-bit integers
+/// using unsigned saturation.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_packus_epi16)
+
+pub fn _mm_packus_epi16(a: __m128i, b: __m128i) -> __m128i {
+    packuswb(BitVec::to_i16x8(a), BitVec::to_i16x8(b)).into()
+}
+
+/// Returns the `imm8` element of `a`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_extract_epi16)
+
+pub fn _mm_extract_epi16<const IMM8: i32>(a: __m128i) -> i32 {
+    // static_assert_uimm_bits!(IMM8, 3);
+    simd_extract(BitVec::to_u16x8(a), IMM8 as u64) as i32
+}
+
+/// Returns a new vector where the `imm8` element of `a` is replaced with `i`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_insert_epi16)
+
+pub fn _mm_insert_epi16<const IMM8: i32>(a: __m128i, i: i32) -> __m128i {
+    // static_assert_uimm_bits!(IMM8, 3);
+    simd_insert(BitVec::to_i16x8(a), IMM8 as u64, i as i16).into()
+}
+
+/// Returns a mask of the most significant bit of each element in `a`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_movemask_epi8)
+
+pub fn _mm_movemask_epi8(a: __m128i) -> i32 {
+    let z = i8x16::from_fn(|_| 0);
+    let m: i8x16 = simd_lt(BitVec::to_i8x16(a), z);
+    let r = simd_bitmask_little!(15, m, u16);
+    r as u32 as i32
+}
+
+/// Shuffles 32-bit integers in `a` using the control in `IMM8`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_shuffle_epi32)
+
+pub fn _mm_shuffle_epi32<const IMM8: i32>(a: __m128i) -> __m128i {
+    // static_assert_uimm_bits!(IMM8, 8);
+
+    let a = BitVec::to_i32x4(a);
+    let x: i32x4 = simd_shuffle(
+        a,
+        a,
+        [
+            IMM8 as u64 & 0b11,
+            (IMM8 as u64 >> 2) & 0b11,
+            (IMM8 as u64 >> 4) & 0b11,
+            (IMM8 as u64 >> 6) & 0b11,
+        ],
+    );
+    x.into()
+}
+
+/// Shuffles 16-bit integers in the high 64 bits of `a` using the control in
+/// `IMM8`.
+///
+/// Put the results in the high 64 bits of the returned vector, with the low 64
+/// bits being copied from `a`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_shufflehi_epi16)
+
+pub fn _mm_shufflehi_epi16<const IMM8: i32>(a: __m128i) -> __m128i {
+    // static_assert_uimm_bits!(IMM8, 8);
+
+    let a = BitVec::to_i16x8(a);
+    let x: i16x8 = simd_shuffle(
+        a,
+        a,
+        [
+            0,
+            1,
+            2,
+            3,
+            (IMM8 as u64 & 0b11) + 4,
+            ((IMM8 as u64 >> 2) & 0b11) + 4,
+            ((IMM8 as u64 >> 4) & 0b11) + 4,
+            ((IMM8 as u64 >> 6) & 0b11) + 4,
+        ],
+    );
+    x.into()
+}
+
+/// Shuffles 16-bit integers in the low 64 bits of `a` using the control in
+/// `IMM8`.
+///
+/// Put the results in the low 64 bits of the returned vector, with the high 64
+/// bits being copied from `a`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_shufflelo_epi16)
+
+pub fn _mm_shufflelo_epi16<const IMM8: i32>(a: __m128i) -> __m128i {
+    // static_assert_uimm_bits!(IMM8, 8);
+
+    let a = BitVec::to_i16x8(a);
+    let x: i16x8 = simd_shuffle(
+        a,
+        a,
+        [
+            IMM8 as u64 & 0b11,
+            (IMM8 as u64 >> 2) & 0b11,
+            (IMM8 as u64 >> 4) & 0b11,
+            (IMM8 as u64 >> 6) & 0b11,
+            4,
+            5,
+            6,
+            7,
+        ],
+    );
+    x.into()
+}
+
+/// Unpacks and interleave 8-bit integers from the high half of `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_unpackhi_epi8)
+
+pub fn _mm_unpackhi_epi8(a: __m128i, b: __m128i) -> __m128i {
+    (simd_shuffle(
+        BitVec::to_i8x16(a),
+        BitVec::to_i8x16(b),
+        [8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31],
+    ))
+    .into()
+}
+
+/// Unpacks and interleave 16-bit integers from the high half of `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_unpackhi_epi16)
+
+pub fn _mm_unpackhi_epi16(a: __m128i, b: __m128i) -> __m128i {
+    let x = simd_shuffle(
+        BitVec::to_i16x8(a),
+        BitVec::to_i16x8(b),
+        [4, 12, 5, 13, 6, 14, 7, 15],
+    );
+    (x).into()
+}
+
+/// Unpacks and interleave 32-bit integers from the high half of `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_unpackhi_epi32)
+
+pub fn _mm_unpackhi_epi32(a: __m128i, b: __m128i) -> __m128i {
+    (simd_shuffle(BitVec::to_i32x4(a), BitVec::to_i32x4(b), [2, 6, 3, 7])).into()
+}
+
+/// Unpacks and interleave 64-bit integers from the high half of `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_unpackhi_epi64)
+
+pub fn _mm_unpackhi_epi64(a: __m128i, b: __m128i) -> __m128i {
+    (simd_shuffle(BitVec::to_i64x2(a), BitVec::to_i64x2(b), [1, 3])).into()
+}
+
+/// Unpacks and interleave 8-bit integers from the low half of `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_unpacklo_epi8)
+
+pub fn _mm_unpacklo_epi8(a: __m128i, b: __m128i) -> __m128i {
+    (simd_shuffle(
+        BitVec::to_i8x16(a),
+        BitVec::to_i8x16(b),
+        [0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23],
+    ))
+    .into()
+}
+
+/// Unpacks and interleave 16-bit integers from the low half of `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_unpacklo_epi16)
+
+pub fn _mm_unpacklo_epi16(a: __m128i, b: __m128i) -> __m128i {
+    let x = simd_shuffle(
+        BitVec::to_i16x8(a),
+        BitVec::to_i16x8(b),
+        [0, 8, 1, 9, 2, 10, 3, 11],
+    );
+    x.into()
+}
+
+/// Unpacks and interleave 32-bit integers from the low half of `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_unpacklo_epi32)
+
+pub fn _mm_unpacklo_epi32(a: __m128i, b: __m128i) -> __m128i {
+    simd_shuffle(BitVec::to_i32x4(a), BitVec::to_i32x4(b), [0, 4, 1, 5]).into()
+}
+
+/// Unpacks and interleave 64-bit integers from the low half of `a` and `b`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_unpacklo_epi64)
+
+pub fn _mm_unpacklo_epi64(a: __m128i, b: __m128i) -> __m128i {
+    simd_shuffle(BitVec::to_i64x2(a), BitVec::to_i64x2(b), [0, 2]).into()
+}
+
+/// Returns vector of type __m128i with indeterminate elements.with indetermination elements.
+/// Despite using the word "undefined" (following Intel's naming scheme), this non-deterministically
+/// picks some valid value and is not equivalent to [`core::mem::MaybeUninit`].
+/// In practice, this is typically equivalent to [`core::mem::zeroed`].
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_undefined_si128)
+
+pub fn _mm_undefined_si128() -> __m128i {
+    BitVec::from_fn(|_| Bit::Zero)
+}
diff --git a/testable-simd-models/src/core_arch/x86/models/ssse3.rs b/testable-simd-models/src/core_arch/x86/models/ssse3.rs
new file mode 100644
index 0000000000000..8d0488430756c
--- /dev/null
+++ b/testable-simd-models/src/core_arch/x86/models/ssse3.rs
@@ -0,0 +1,369 @@
+//! Supplemental Streaming SIMD Extensions 3 (SSSE3)
+
+use crate::abstractions::{bitvec::BitVec, simd::*};
+
+use super::types::*;
+
+mod c_extern {
+    use crate::abstractions::simd::*;
+    pub fn pshufb128(a: u8x16, b: u8x16) -> u8x16 {
+        u8x16::from_fn(|i| if b[i] > 127 { 0 } else { a[(b[i] % 16) as u64] })
+    }
+
+    pub fn phaddw128(a: i16x8, b: i16x8) -> i16x8 {
+        i16x8::from_fn(|i| {
+            if i < 4 {
+                a[2 * i].wrapping_add(a[2 * i + 1])
+            } else {
+                b[2 * (i - 4)].wrapping_add(b[2 * (i - 4) + 1])
+            }
+        })
+    }
+
+    pub fn phaddsw128(a: i16x8, b: i16x8) -> i16x8 {
+        i16x8::from_fn(|i| {
+            if i < 4 {
+                a[2 * i].saturating_add(a[2 * i + 1])
+            } else {
+                b[2 * (i - 4)].saturating_add(b[2 * (i - 4) + 1])
+            }
+        })
+    }
+
+    pub fn phaddd128(a: i32x4, b: i32x4) -> i32x4 {
+        i32x4::from_fn(|i| {
+            if i < 2 {
+                a[2 * i].wrapping_add(a[2 * i + 1])
+            } else {
+                b[2 * (i - 2)].wrapping_add(b[2 * (i - 2) + 1])
+            }
+        })
+    }
+
+    pub fn phsubw128(a: i16x8, b: i16x8) -> i16x8 {
+        i16x8::from_fn(|i| {
+            if i < 4 {
+                a[2 * i].wrapping_sub(a[2 * i + 1])
+            } else {
+                b[2 * (i - 4)].wrapping_sub(b[2 * (i - 4) + 1])
+            }
+        })
+    }
+
+    pub fn phsubsw128(a: i16x8, b: i16x8) -> i16x8 {
+        i16x8::from_fn(|i| {
+            if i < 4 {
+                a[2 * i].saturating_sub(a[2 * i + 1])
+            } else {
+                b[2 * (i - 4)].saturating_sub(b[2 * (i - 4) + 1])
+            }
+        })
+    }
+
+    pub fn phsubd128(a: i32x4, b: i32x4) -> i32x4 {
+        i32x4::from_fn(|i| {
+            if i < 2 {
+                a[2 * i].wrapping_sub(a[2 * i + 1])
+            } else {
+                b[2 * (i - 2)].wrapping_sub(b[2 * (i - 2) + 1])
+            }
+        })
+    }
+
+    pub fn pmaddubsw128(a: u8x16, b: i8x16) -> i16x8 {
+        i16x8::from_fn(|i| {
+            ((a[2 * i] as u8 as u16 as i16) * (b[2 * i] as i8 as i16))
+                .saturating_add((a[2 * i + 1] as u8 as u16 as i16) * (b[2 * i + 1] as i8 as i16))
+        })
+    }
+
+    pub fn pmulhrsw128(a: i16x8, b: i16x8) -> i16x8 {
+        i16x8::from_fn(|i| {
+            let temp = (a[i] as i32) * (b[i] as i32);
+            let temp = (temp >> 14).wrapping_add(1) >> 1;
+            temp as i16
+        })
+    }
+
+    pub fn psignb128(a: i8x16, b: i8x16) -> i8x16 {
+        i8x16::from_fn(|i| {
+            if b[i] < 0 {
+                if a[i] == i8::MIN {
+                    a[i]
+                } else {
+                    -a[i]
+                }
+            } else if b[i] > 0 {
+                a[i]
+            } else {
+                0
+            }
+        })
+    }
+
+    pub fn psignw128(a: i16x8, b: i16x8) -> i16x8 {
+        i16x8::from_fn(|i| {
+            if b[i] < 0 {
+                if a[i] == i16::MIN {
+                    a[i]
+                } else {
+                    -a[i]
+                }
+            } else if b[i] > 0 {
+                a[i]
+            } else {
+                0
+            }
+        })
+    }
+
+    pub fn psignd128(a: i32x4, b: i32x4) -> i32x4 {
+        i32x4::from_fn(|i| {
+            if b[i] < 0 {
+                if a[i] == i32::MIN {
+                    a[i]
+                } else {
+                    -a[i]
+                }
+            } else if b[i] > 0 {
+                a[i]
+            } else {
+                0
+            }
+        })
+    }
+}
+
+use super::sse2::*;
+use c_extern::*;
+/// Computes the absolute value of packed 8-bit signed integers in `a` and
+/// return the unsigned results.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_abs_epi8)
+pub fn _mm_abs_epi8(a: __m128i) -> __m128i {
+    let a = BitVec::to_i8x16(a);
+    let zero = i8x16::from_fn(|_| 0);
+    let r = simd_select(simd_lt(a, zero), simd_neg(a), a);
+    BitVec::from_i8x16(r)
+}
+
+/// Computes the absolute value of each of the packed 16-bit signed integers in
+/// `a` and
+/// return the 16-bit unsigned integer
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_abs_epi16)
+pub fn _mm_abs_epi16(a: __m128i) -> __m128i {
+    let a = BitVec::to_i16x8(a);
+    let zero = i16x8::from_fn(|_| 0);
+    let r = simd_select(simd_lt(a, zero), simd_neg(a), a);
+    BitVec::from_i16x8(r)
+}
+
+/// Computes the absolute value of each of the packed 32-bit signed integers in
+/// `a` and
+/// return the 32-bit unsigned integer
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_abs_epi32)
+pub fn _mm_abs_epi32(a: __m128i) -> __m128i {
+    let a = BitVec::to_i32x4(a);
+    let zero = i32x4::from_fn(|_| 0);
+    let r = simd_select(simd_lt(a, zero), simd_neg(a), a);
+    BitVec::from_i32x4(r)
+}
+
+/// Shuffles bytes from `a` according to the content of `b`.
+///
+/// The last 4 bits of each byte of `b` are used as addresses
+/// into the 16 bytes of `a`.
+///
+/// In addition, if the highest significant bit of a byte of `b`
+/// is set, the respective destination byte is set to 0.
+///
+/// Picturing `a` and `b` as `[u8; 16]`, `_mm_shuffle_epi8` is
+/// logically equivalent to:
+///
+/// ```
+/// fn mm_shuffle_epi8(a: [u8; 16], b: [u8; 16]) -> [u8; 16] {
+///     let mut r = [0u8; 16];
+///     for i in 0..16 {
+///         // if the most significant bit of b is set,
+///         // then the destination byte is set to 0.
+///         if b[i] & 0x80 == 0u8 {
+///             r[i] = a[(b[i] % 16) as usize];
+///         }
+///     }
+///     r
+/// }
+/// ```
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_shuffle_epi8)
+pub fn _mm_shuffle_epi8(a: __m128i, b: __m128i) -> __m128i {
+    BitVec::from_u8x16(pshufb128(BitVec::to_u8x16(a), BitVec::to_u8x16(b)))
+}
+
+/// Concatenate 16-byte blocks in `a` and `b` into a 32-byte temporary result,
+/// shift the result right by `n` bytes, and returns the low 16 bytes.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_alignr_epi8)
+
+pub fn _mm_alignr_epi8<const IMM8: i32>(a: __m128i, b: __m128i) -> __m128i {
+    // TODO static_assert_uimm_bits!(IMM8, 8);
+    // If palignr is shifting the pair of vectors more than the size of two
+    // lanes, emit zero.
+    if IMM8 > 32 {
+        return _mm_setzero_si128();
+    }
+    // If palignr is shifting the pair of input vectors more than one lane,
+    // but less than two lanes, convert to shifting in zeroes.
+    let (a, b) = if IMM8 > 16 {
+        (_mm_setzero_si128(), a)
+    } else {
+        (a, b)
+    };
+    const fn mask(shift: u64, i: u64) -> u64 {
+        if shift > 32 {
+            // Unused, but needs to be a valid index.
+            i
+        } else if shift > 16 {
+            shift - 16 + i
+        } else {
+            shift + i
+        }
+    }
+
+    let r: i8x16 = simd_shuffle(
+        BitVec::to_i8x16(b),
+        BitVec::to_i8x16(a),
+        [
+            mask(IMM8 as u64, 0),
+            mask(IMM8 as u64, 1),
+            mask(IMM8 as u64, 2),
+            mask(IMM8 as u64, 3),
+            mask(IMM8 as u64, 4),
+            mask(IMM8 as u64, 5),
+            mask(IMM8 as u64, 6),
+            mask(IMM8 as u64, 7),
+            mask(IMM8 as u64, 8),
+            mask(IMM8 as u64, 9),
+            mask(IMM8 as u64, 10),
+            mask(IMM8 as u64, 11),
+            mask(IMM8 as u64, 12),
+            mask(IMM8 as u64, 13),
+            mask(IMM8 as u64, 14),
+            mask(IMM8 as u64, 15),
+        ],
+    );
+    r.into()
+}
+
+/// Horizontally adds the adjacent pairs of values contained in 2 packed
+/// 128-bit vectors of `[8 x i16]`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_hadd_epi16)
+
+pub fn _mm_hadd_epi16(a: __m128i, b: __m128i) -> __m128i {
+    phaddw128(BitVec::to_i16x8(a), BitVec::to_i16x8(b)).into()
+}
+
+/// Horizontally adds the adjacent pairs of values contained in 2 packed
+/// 128-bit vectors of `[8 x i16]`. Positive sums greater than 7FFFh are
+/// saturated to 7FFFh. Negative sums less than 8000h are saturated to 8000h.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_hadds_epi16)
+
+pub fn _mm_hadds_epi16(a: __m128i, b: __m128i) -> __m128i {
+    phaddsw128(BitVec::to_i16x8(a), BitVec::to_i16x8(b)).into()
+}
+
+/// Horizontally adds the adjacent pairs of values contained in 2 packed
+/// 128-bit vectors of `[4 x i32]`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_hadd_epi32)
+
+pub fn _mm_hadd_epi32(a: __m128i, b: __m128i) -> __m128i {
+    phaddd128(BitVec::to_i32x4(a), BitVec::to_i32x4(b)).into()
+}
+
+/// Horizontally subtract the adjacent pairs of values contained in 2
+/// packed 128-bit vectors of `[8 x i16]`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_hsub_epi16)
+
+pub fn _mm_hsub_epi16(a: __m128i, b: __m128i) -> __m128i {
+    phsubw128(BitVec::to_i16x8(a), BitVec::to_i16x8(b)).into()
+}
+
+/// Horizontally subtract the adjacent pairs of values contained in 2
+/// packed 128-bit vectors of `[8 x i16]`. Positive differences greater than
+/// 7FFFh are saturated to 7FFFh. Negative differences less than 8000h are
+/// saturated to 8000h.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_hsubs_epi16)
+
+pub fn _mm_hsubs_epi16(a: __m128i, b: __m128i) -> __m128i {
+    phsubsw128(BitVec::to_i16x8(a), BitVec::to_i16x8(b)).into()
+}
+
+/// Horizontally subtract the adjacent pairs of values contained in 2
+/// packed 128-bit vectors of `[4 x i32]`.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_hsub_epi32)
+
+pub fn _mm_hsub_epi32(a: __m128i, b: __m128i) -> __m128i {
+    phsubd128(BitVec::to_i32x4(a), BitVec::to_i32x4(b)).into()
+}
+
+/// Multiplies corresponding pairs of packed 8-bit unsigned integer
+/// values contained in the first source operand and packed 8-bit signed
+/// integer values contained in the second source operand, add pairs of
+/// contiguous products with signed saturation, and writes the 16-bit sums to
+/// the corresponding bits in the destination.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_maddubs_epi16)
+
+pub fn _mm_maddubs_epi16(a: __m128i, b: __m128i) -> __m128i {
+    pmaddubsw128(BitVec::to_u8x16(a), BitVec::to_i8x16(b)).into()
+}
+
+/// Multiplies packed 16-bit signed integer values, truncate the 32-bit
+/// product to the 18 most significant bits by right-shifting, round the
+/// truncated value by adding 1, and write bits `[16:1]` to the destination.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_mulhrs_epi16)
+
+pub fn _mm_mulhrs_epi16(a: __m128i, b: __m128i) -> __m128i {
+    pmulhrsw128(BitVec::to_i16x8(a), BitVec::to_i16x8(b)).into()
+}
+
+/// Negates packed 8-bit integers in `a` when the corresponding signed 8-bit
+/// integer in `b` is negative, and returns the result.
+/// Elements in result are zeroed out when the corresponding element in `b`
+/// is zero.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_sign_epi8)
+
+pub fn _mm_sign_epi8(a: __m128i, b: __m128i) -> __m128i {
+    psignb128(BitVec::to_i8x16(a), BitVec::to_i8x16(b)).into()
+}
+
+/// Negates packed 16-bit integers in `a` when the corresponding signed 16-bit
+/// integer in `b` is negative, and returns the results.
+/// Elements in result are zeroed out when the corresponding element in `b`
+/// is zero.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_sign_epi16)
+
+pub fn _mm_sign_epi16(a: __m128i, b: __m128i) -> __m128i {
+    psignw128(BitVec::to_i16x8(a), BitVec::to_i16x8(b)).into()
+}
+
+/// Negates packed 32-bit integers in `a` when the corresponding signed 32-bit
+/// integer in `b` is negative, and returns the results.
+/// Element in result are zeroed out when the corresponding element in `b`
+/// is zero.
+///
+/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_sign_epi32)
+
+pub fn _mm_sign_epi32(a: __m128i, b: __m128i) -> __m128i {
+    psignd128(BitVec::to_i32x4(a), BitVec::to_i32x4(b)).into()
+}
diff --git a/testable-simd-models/src/core_arch/x86/tests/avx.rs b/testable-simd-models/src/core_arch/x86/tests/avx.rs
new file mode 100644
index 0000000000000..4ffa0dc139b9d
--- /dev/null
+++ b/testable-simd-models/src/core_arch/x86/tests/avx.rs
@@ -0,0 +1,132 @@
+use super::types::*;
+use super::upstream;
+use crate::abstractions::bitvec::BitVec;
+use crate::helpers::test::HasRandom;
+
+/// Derives tests for a given intrinsics. Test that a given intrinsics and its model compute the same thing over random values (1000 by default).
+macro_rules! mk {
+    ($([$N:literal])?$name:ident$({$(<$($c:literal),*>),*})?($($x:ident : $ty:ident),*)) => {
+        #[test]
+        fn $name() {
+            #[allow(unused)]
+            const N: usize = {
+                let n: usize = 1000;
+                $(let n: usize = $N;)?
+                    n
+            };
+            mk!(@[N]$name$($(<$($c),*>)*)?($($x : $ty),*));
+        }
+    };
+    (@[$N:ident]$name:ident$(<$($c:literal),*>)?($($x:ident : $ty:ident),*)) => {
+        for _ in 0..$N {
+            $(let $x = $ty::random();)*
+                assert_eq!(super::super::models::avx::$name$(::<$($c,)*>)?($($x.into(),)*), unsafe {
+                    BitVec::from(upstream::$name$(::<$($c,)*>)?($($x.into(),)*)).into()
+                });
+        }
+    };
+    (@[$N:ident]$name:ident<$($c1:literal),*>$(<$($c:literal),*>)*($($x:ident : $ty:ident),*)) => {
+        let one = || {
+            mk!(@[$N]$name<$($c1),*>($($x : $ty),*));
+        };
+        one();
+        mk!(@[$N]$name$(<$($c),*>)*($($x : $ty),*));
+    }
+}
+mk!(_mm256_blendv_ps(a: __m256, b: __m256, c: __m256));
+
+#[test]
+fn _mm256_movemask_ps() {
+    let n = 1000;
+
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx::_mm256_movemask_ps(a.into()),
+            unsafe { upstream::_mm256_movemask_ps(a.into()) }
+        );
+    }
+}
+
+#[test]
+fn _mm256_testz_si256() {
+    let n = 1000;
+
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        let b: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx::_mm256_testz_si256(a.into(), b.into()),
+            unsafe { upstream::_mm256_testz_si256(a.into(), b.into()) }
+        );
+    }
+}
+
+mk!(_mm256_setzero_ps());
+mk!(_mm256_setzero_si256());
+mk!(_mm256_set_epi8(
+    e00: i8,
+    e01: i8,
+    e02: i8,
+    e03: i8,
+    e04: i8,
+    e05: i8,
+    e06: i8,
+    e07: i8,
+    e08: i8,
+    e09: i8,
+    e10: i8,
+    e11: i8,
+    e12: i8,
+    e13: i8,
+    e14: i8,
+    e15: i8,
+    e16: i8,
+    e17: i8,
+    e18: i8,
+    e19: i8,
+    e20: i8,
+    e21: i8,
+    e22: i8,
+    e23: i8,
+    e24: i8,
+    e25: i8,
+    e26: i8,
+    e27: i8,
+    e28: i8,
+    e29: i8,
+    e30: i8,
+    e31: i8
+));
+mk!(_mm256_set_epi16(
+    e00: i16,
+    e01: i16,
+    e02: i16,
+    e03: i16,
+    e04: i16,
+    e05: i16,
+    e06: i16,
+    e07: i16,
+    e08: i16,
+    e09: i16,
+    e10: i16,
+    e11: i16,
+    e12: i16,
+    e13: i16,
+    e14: i16,
+    e15: i16
+));
+mk!(_mm256_set_epi32(
+    e0: i32,
+    e1: i32,
+    e2: i32,
+    e3: i32,
+    e4: i32,
+    e5: i32,
+    e6: i32,
+    e7: i32
+));
+mk!(_mm256_set_epi64x(a: i64, b: i64, c: i64, d: i64));
+mk!(_mm256_set1_epi8(a: i8));
+mk!(_mm256_set1_epi16(a: i16));
+mk!(_mm256_set1_epi32(a: i32));
diff --git a/testable-simd-models/src/core_arch/x86/tests/avx2.rs b/testable-simd-models/src/core_arch/x86/tests/avx2.rs
new file mode 100644
index 0000000000000..a1b8378566403
--- /dev/null
+++ b/testable-simd-models/src/core_arch/x86/tests/avx2.rs
@@ -0,0 +1,531 @@
+use super::upstream;
+use crate::abstractions::bitvec::BitVec;
+use crate::helpers::test::HasRandom;
+
+/// Derives tests for a given intrinsics. Test that a given intrinsics and its model compute the same thing over random values (1000 by default).
+macro_rules! mk {
+    ($([$N:literal])?$name:ident$({$(<$($c:literal),*>),*})?($($x:ident : $ty:ident),*)) => {
+        #[test]
+        fn $name() {
+            #[allow(unused)]
+            const N: usize = {
+                let n: usize = 1000;
+                $(let n: usize = $N;)?
+                    n
+            };
+            mk!(@[N]$name$($(<$($c),*>)*)?($($x : $ty),*));
+        }
+    };
+    (@[$N:ident]$name:ident$(<$($c:literal),*>)?($($x:ident : $ty:ident),*)) => {
+        for _ in 0..$N {
+            $(let $x = $ty::random();)*
+                assert_eq!(super::super::models::avx2::$name$(::<$($c,)*>)?($($x.into(),)*), unsafe {
+                    BitVec::from(upstream::$name$(::<$($c,)*>)?($($x.into(),)*)).into()
+                });
+        }
+    };
+    (@[$N:ident]$name:ident<$($c1:literal),*>$(<$($c:literal),*>)*($($x:ident : $ty:ident),*)) => {
+        let one = || {
+            mk!(@[$N]$name<$($c1),*>($($x : $ty),*));
+        };
+        one();
+        mk!(@[$N]$name$(<$($c),*>)*($($x : $ty),*));
+    }
+}
+
+mk!(_mm256_abs_epi32(a: BitVec));
+mk!(_mm256_abs_epi16(a: BitVec));
+mk!(_mm256_abs_epi8(a: BitVec));
+mk!(_mm256_add_epi64(a: BitVec, b: BitVec));
+mk!(_mm256_add_epi32(a: BitVec, b: BitVec));
+mk!(_mm256_add_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_add_epi8(a: BitVec, b: BitVec));
+mk!(_mm256_adds_epi8(a: BitVec, b: BitVec));
+mk!(_mm256_adds_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_adds_epu8(a: BitVec, b: BitVec));
+mk!(_mm256_adds_epu16(a: BitVec, b: BitVec));
+mk!([100]_mm256_alignr_epi8{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: BitVec, b: BitVec));
+mk!([100]_mm256_permute2x128_si256{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: BitVec, b: BitVec));
+mk!(_mm256_blendv_epi8(a: BitVec, b: BitVec, mask: BitVec));
+mk!(_mm_broadcastb_epi8(a: BitVec));
+mk!(_mm256_broadcastb_epi8(a: BitVec));
+mk!(_mm_broadcastd_epi32(a: BitVec));
+mk!(_mm256_broadcastd_epi32(a: BitVec));
+mk!(_mm_broadcastq_epi64(a: BitVec));
+mk!(_mm256_broadcastq_epi64(a: BitVec));
+mk!(_mm_broadcastsi128_si256(a: BitVec));
+mk!(_mm256_broadcastsi128_si256(a: BitVec));
+mk!(_mm_broadcastw_epi16(a: BitVec));
+mk!(_mm256_broadcastw_epi16(a: BitVec));
+mk!(_mm256_cmpeq_epi64(a: BitVec, b: BitVec));
+mk!(_mm256_cmpeq_epi32(a: BitVec, b: BitVec));
+mk!(_mm256_cmpeq_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_cmpeq_epi8(a: BitVec, b: BitVec));
+mk!(_mm256_cmpgt_epi64(a: BitVec, b: BitVec));
+mk!(_mm256_cmpgt_epi32(a: BitVec, b: BitVec));
+mk!(_mm256_cmpgt_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_cmpgt_epi8(a: BitVec, b: BitVec));
+mk!(_mm256_cvtepi16_epi32(a: BitVec));
+mk!(_mm256_cvtepi16_epi64(a: BitVec));
+mk!(_mm256_cvtepi32_epi64(a: BitVec));
+mk!(_mm256_cvtepi8_epi16(a: BitVec));
+mk!(_mm256_cvtepi8_epi32(a: BitVec));
+mk!(_mm256_cvtepi8_epi64(a: BitVec));
+mk!(_mm256_cvtepu16_epi32(a: BitVec));
+mk!(_mm256_cvtepu16_epi64(a: BitVec));
+mk!(_mm256_cvtepu32_epi64(a: BitVec));
+mk!(_mm256_cvtepu8_epi16(a: BitVec));
+mk!(_mm256_cvtepu8_epi32(a: BitVec));
+mk!(_mm256_cvtepu8_epi64(a: BitVec));
+mk!(_mm256_extracti128_si256{<0>,<1>}(a: BitVec));
+mk!(_mm256_hadd_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_hadd_epi32(a: BitVec, b: BitVec));
+mk!(_mm256_hadds_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_hsub_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_hsub_epi32(a: BitVec, b: BitVec));
+mk!(_mm256_hsubs_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_inserti128_si256{<0>,<1>}(a: BitVec, b: BitVec));
+mk!(_mm256_madd_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_maddubs_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_max_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_max_epi32(a: BitVec, b: BitVec));
+mk!(_mm256_max_epi8(a: BitVec, b: BitVec));
+mk!(_mm256_max_epu16(a: BitVec, b: BitVec));
+mk!(_mm256_max_epu32(a: BitVec, b: BitVec));
+mk!(_mm256_max_epu8(a: BitVec, b: BitVec));
+mk!(_mm256_min_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_min_epi32(a: BitVec, b: BitVec));
+mk!(_mm256_min_epi8(a: BitVec, b: BitVec));
+mk!(_mm256_min_epu16(a: BitVec, b: BitVec));
+mk!(_mm256_min_epu32(a: BitVec, b: BitVec));
+mk!(_mm256_min_epu8(a: BitVec, b: BitVec));
+mk!(_mm256_mul_epi32(a: BitVec, b: BitVec));
+mk!(_mm256_mul_epu32(a: BitVec, b: BitVec));
+mk!(_mm256_mulhi_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_mulhi_epu16(a: BitVec, b: BitVec));
+mk!(_mm256_mullo_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_mullo_epi32(a: BitVec, b: BitVec));
+mk!(_mm256_mulhrs_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_or_si256(a: BitVec, b: BitVec));
+mk!(_mm256_packs_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_packs_epi32(a: BitVec, b: BitVec));
+mk!(_mm256_packus_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_packus_epi32(a: BitVec, b: BitVec));
+mk!(_mm256_permutevar8x32_epi32(a: BitVec, b: BitVec));
+#[test]
+fn _mm256_movemask_epi8() {
+    let n = 1000;
+
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_movemask_epi8(a.into()),
+            unsafe { upstream::_mm256_movemask_epi8(a.into()) }
+        );
+    }
+}
+mk!([100]_mm256_mpsadbw_epu8{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: BitVec, b: BitVec));
+
+mk!([100]_mm256_permute4x64_epi64{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: BitVec));
+mk!([100]_mm256_shuffle_epi32{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: BitVec));
+mk!([100]_mm256_shufflehi_epi16{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: BitVec));
+mk!([100]_mm256_shufflelo_epi16{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: BitVec));
+mk!(_mm256_sad_epu8(a: BitVec, b: BitVec));
+mk!(_mm256_shuffle_epi8(a: BitVec, b: BitVec));
+mk!(_mm256_sign_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_sign_epi32(a: BitVec, b: BitVec));
+mk!(_mm256_sign_epi8(a: BitVec, b: BitVec));
+mk!(_mm256_sll_epi16(a: BitVec, count: BitVec));
+mk!(_mm256_sll_epi32(a: BitVec, count: BitVec));
+mk!(_mm256_sll_epi64(a: BitVec, count: BitVec));
+mk!([100]_mm256_slli_epi16{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: BitVec));
+mk!([100]_mm256_slli_epi32{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: BitVec));
+mk!([100]_mm256_slli_epi64{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: BitVec));
+mk!([100]_mm256_slli_si256{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: BitVec));
+mk!([100]_mm256_bslli_epi128{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: BitVec));
+mk!(_mm_sllv_epi32(a: BitVec, count: BitVec));
+mk!(_mm256_sllv_epi32(a: BitVec, count: BitVec));
+mk!(_mm_sllv_epi64(a: BitVec, count: BitVec));
+mk!(_mm256_sllv_epi64(a: BitVec, count: BitVec));
+mk!(_mm256_sra_epi16(a: BitVec, count: BitVec));
+mk!(_mm256_sra_epi32(a: BitVec, count: BitVec));
+mk!([100]_mm256_srai_epi16{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: BitVec));
+mk!(_mm256_srai_epi32{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: BitVec));
+mk!(_mm_srav_epi32(a: BitVec, count: BitVec));
+mk!(_mm256_srav_epi32(a: BitVec, count: BitVec));
+mk!([100]_mm256_srli_si256{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: BitVec));
+mk!([100]_mm256_bsrli_epi128{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: BitVec));
+mk!(_mm256_srl_epi16(a: BitVec, count: BitVec));
+mk!(_mm256_srl_epi32(a: BitVec, count: BitVec));
+mk!(_mm256_srl_epi64(a: BitVec, count: BitVec));
+mk!([100]_mm256_srli_epi16{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: BitVec));
+mk!([100]_mm256_srli_epi32{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: BitVec));
+mk!([100]_mm256_srli_epi64{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: BitVec));
+mk!(_mm_srlv_epi32(a: BitVec, count: BitVec));
+mk!(_mm256_srlv_epi32(a: BitVec, count: BitVec));
+mk!(_mm_srlv_epi64(a: BitVec, count: BitVec));
+mk!(_mm256_srlv_epi64(a: BitVec, count: BitVec));
+mk!(_mm256_sub_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_sub_epi32(a: BitVec, b: BitVec));
+mk!(_mm256_sub_epi64(a: BitVec, b: BitVec));
+mk!(_mm256_sub_epi8(a: BitVec, b: BitVec));
+mk!(_mm256_subs_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_subs_epi8(a: BitVec, b: BitVec));
+mk!(_mm256_subs_epu16(a: BitVec, b: BitVec));
+mk!(_mm256_subs_epu8(a: BitVec, b: BitVec));
+mk!(_mm256_unpackhi_epi8(a: BitVec, b: BitVec));
+mk!(_mm256_unpacklo_epi8(a: BitVec, b: BitVec));
+mk!(_mm256_unpackhi_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_unpacklo_epi16(a: BitVec, b: BitVec));
+mk!(_mm256_unpackhi_epi32(a: BitVec, b: BitVec));
+mk!(_mm256_unpacklo_epi32(a: BitVec, b: BitVec));
+mk!(_mm256_unpackhi_epi64(a: BitVec, b: BitVec));
+mk!(_mm256_unpacklo_epi64(a: BitVec, b: BitVec));
+mk!(_mm256_xor_si256(a: BitVec, b: BitVec));
+#[test]
+fn _mm256_extract_epi8() {
+    let n = 100;
+
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<0>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<0>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<1>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<1>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<2>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<2>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<3>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<3>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<4>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<4>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<5>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<5>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<6>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<6>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<7>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<7>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<8>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<8>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<9>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<9>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<10>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<10>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<11>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<11>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<12>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<12>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<13>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<13>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<14>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<14>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<15>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<15>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<16>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<16>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<17>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<17>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<18>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<18>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<19>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<19>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<20>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<20>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<21>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<21>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<22>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<22>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<23>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<23>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<24>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<24>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<25>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<25>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<26>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<26>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<27>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<27>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<28>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<28>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<29>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<29>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<30>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<30>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi8::<31>(a.into()),
+            unsafe { upstream::_mm256_extract_epi8::<31>(a.into()) }
+        );
+    }
+}
+
+#[test]
+fn _mm256_extract_epi16() {
+    let n = 100;
+
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi16::<0>(a.into()),
+            unsafe { upstream::_mm256_extract_epi16::<0>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi16::<1>(a.into()),
+            unsafe { upstream::_mm256_extract_epi16::<1>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi16::<2>(a.into()),
+            unsafe { upstream::_mm256_extract_epi16::<2>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi16::<3>(a.into()),
+            unsafe { upstream::_mm256_extract_epi16::<3>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi16::<4>(a.into()),
+            unsafe { upstream::_mm256_extract_epi16::<4>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi16::<5>(a.into()),
+            unsafe { upstream::_mm256_extract_epi16::<5>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi16::<6>(a.into()),
+            unsafe { upstream::_mm256_extract_epi16::<6>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi16::<7>(a.into()),
+            unsafe { upstream::_mm256_extract_epi16::<7>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi16::<8>(a.into()),
+            unsafe { upstream::_mm256_extract_epi16::<8>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi16::<9>(a.into()),
+            unsafe { upstream::_mm256_extract_epi16::<9>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi16::<10>(a.into()),
+            unsafe { upstream::_mm256_extract_epi16::<10>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi16::<11>(a.into()),
+            unsafe { upstream::_mm256_extract_epi16::<11>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi16::<12>(a.into()),
+            unsafe { upstream::_mm256_extract_epi16::<12>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi16::<13>(a.into()),
+            unsafe { upstream::_mm256_extract_epi16::<13>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi16::<14>(a.into()),
+            unsafe { upstream::_mm256_extract_epi16::<14>(a.into()) }
+        );
+    }
+    for _ in 0..n {
+        let a: BitVec<256> = BitVec::random();
+        assert_eq!(
+            super::super::models::avx2::_mm256_extract_epi16::<15>(a.into()),
+            unsafe { upstream::_mm256_extract_epi16::<15>(a.into()) }
+        );
+    }
+}
diff --git a/testable-simd-models/src/core_arch/x86/tests/mod.rs b/testable-simd-models/src/core_arch/x86/tests/mod.rs
new file mode 100644
index 0000000000000..b5a0c3a449715
--- /dev/null
+++ b/testable-simd-models/src/core_arch/x86/tests/mod.rs
@@ -0,0 +1,113 @@
+//! Tests for intrinsics defined in `crate::core_arch::x86::models`
+//!
+//! Each and every modelled intrinsic is tested against the Rust
+//! implementation here. For the most part, the tests work by
+//! generating random inputs, passing them as arguments
+//! to both the models in this crate, and the corresponding intrinsics
+//! in the Rust core and then comparing their outputs.
+//!
+//! To add a test for a modelled intrinsic, go the appropriate file, and
+//! use the `mk!` macro to define it.
+//!
+//! A `mk!` macro invocation looks like the following,
+//! `mk!([<number of times the random test happens>]<function name>{<<const values, if the function takes any>,>}(<function arguments : with types,>))
+//!
+//! For example, some valid invocations are
+//!
+//! `mk!([100]_mm256_extracti128_si256{<0>,<1>}(a: BitVec));`
+//! `mk!(_mm256_extracti128_si256{<0>,<1>}(a: BitVec));`
+//! `mk!(_mm256_abs_epi16(a: BitVec));`
+//!
+//! The number of random tests is optional. If not provided, it is taken to be 1000 by default.
+//! The const values are necessary if the function has constant arguments, but should be discarded if not.
+//! The function name and the function arguments are necessary in all cases.
+//!
+//! Note: This only works if the function returns a bit-vector or funarray. If it returns an integer, the
+//! test has to be written manually. It is recommended that the manually defined test follows
+//! the pattern of tests defined via the `mk!` invocation. It is also recommended that, in the
+//! case that the intrinsic takes constant arguments, each and every possible constant value
+//! (upto a maximum of 255) that can be passed to the function be used for testing. The number
+//! of constant values passed depends on if the Rust intrinsics statically asserts that the
+//! length of the constant argument be less than or equal to a certain number of bits.
+
+mod avx;
+mod avx2;
+mod sse2;
+mod ssse3;
+use crate::abstractions::bitvec::*;
+
+pub(crate) mod types {
+    use crate::abstractions::bitvec::*;
+
+    #[allow(non_camel_case_types)]
+    pub type __m256i = BitVec<256>;
+    #[allow(non_camel_case_types)]
+    pub type __m256 = BitVec<256>;
+    #[allow(non_camel_case_types)]
+    pub type __m128i = BitVec<128>;
+}
+
+pub(crate) mod upstream {
+    #[cfg(target_arch = "x86")]
+    pub use core::arch::x86::*;
+    #[cfg(target_arch = "x86_64")]
+    pub use core::arch::x86_64::*;
+}
+
+mod conversions {
+    use super::upstream::{
+        __m128i, __m256, __m256i, _mm256_castps_si256, _mm256_castsi256_ps, _mm256_loadu_si256,
+        _mm256_storeu_si256, _mm_loadu_si128, _mm_storeu_si128,
+    };
+    use super::BitVec;
+
+    impl From<BitVec<256>> for __m256i {
+        fn from(bv: BitVec<256>) -> __m256i {
+            let bv: &[u8] = &bv.to_vec()[..];
+            unsafe { _mm256_loadu_si256(bv.as_ptr() as *const _) }
+        }
+    }
+    impl From<BitVec<256>> for __m256 {
+        fn from(bv: BitVec<256>) -> __m256 {
+            let bv: &[u8] = &bv.to_vec()[..];
+            unsafe { _mm256_castsi256_ps(_mm256_loadu_si256(bv.as_ptr() as *const _)) }
+        }
+    }
+
+    impl From<BitVec<128>> for __m128i {
+        fn from(bv: BitVec<128>) -> __m128i {
+            let slice: &[u8] = &bv.to_vec()[..];
+            unsafe { _mm_loadu_si128(slice.as_ptr() as *const __m128i) }
+        }
+    }
+
+    impl From<__m256i> for BitVec<256> {
+        fn from(vec: __m256i) -> BitVec<256> {
+            let mut v = [0u8; 32];
+            unsafe {
+                _mm256_storeu_si256(v.as_mut_ptr() as *mut _, vec);
+            }
+            BitVec::from_slice(&v[..], 8)
+        }
+    }
+
+    impl From<__m256> for BitVec<256> {
+        fn from(vec: __m256) -> BitVec<256> {
+            let mut v = [0u8; 32];
+            unsafe {
+                _mm256_storeu_si256(v.as_mut_ptr() as *mut _, _mm256_castps_si256(vec));
+            }
+            BitVec::from_slice(&v[..], 8)
+        }
+    }
+
+    impl From<__m128i> for BitVec<128> {
+        fn from(vec: __m128i) -> BitVec<128> {
+            let mut v = [0u8; 16];
+            unsafe {
+                _mm_storeu_si128(v.as_mut_ptr() as *mut _, vec);
+            }
+            BitVec::from_slice(&v[..], 8)
+        }
+    }
+}
diff --git a/testable-simd-models/src/core_arch/x86/tests/sse2.rs b/testable-simd-models/src/core_arch/x86/tests/sse2.rs
new file mode 100644
index 0000000000000..ed387f5938524
--- /dev/null
+++ b/testable-simd-models/src/core_arch/x86/tests/sse2.rs
@@ -0,0 +1,201 @@
+use super::types::*;
+use super::upstream;
+use crate::abstractions::bitvec::BitVec;
+use crate::helpers::test::HasRandom;
+
+/// Derives tests for a given intrinsics. Test that a given intrinsics and its model compute the same thing over random values (1000 by default).
+macro_rules! mk {
+    ($([$N:literal])?$name:ident$({$(<$($c:literal),*>),*})?($($x:ident : $ty:ident),*)) => {
+        #[test]
+        fn $name() {
+            #[allow(unused)]
+            const N: usize = {
+                let n: usize = 1000;
+                $(let n: usize = $N;)?
+                    n
+            };
+            mk!(@[N]$name$($(<$($c),*>)*)?($($x : $ty),*));
+        }
+    };
+    (@[$N:ident]$name:ident$(<$($c:literal),*>)?($($x:ident : $ty:ident),*)) => {
+        for _ in 0..$N {
+            $(let $x = $ty::random();)*
+                assert_eq!(super::super::models::sse2::$name$(::<$($c,)*>)?($($x.into(),)*), unsafe {
+                    BitVec::from(upstream::$name$(::<$($c,)*>)?($($x.into(),)*)).into()
+                });
+        }
+    };
+    (@[$N:ident]$name:ident<$($c1:literal),*>$(<$($c:literal),*>)*($($x:ident : $ty:ident),*)) => {
+        let one = || {
+            mk!(@[$N]$name<$($c1),*>($($x : $ty),*));
+        };
+        one();
+        mk!(@[$N]$name$(<$($c),*>)*($($x : $ty),*));
+    }
+}
+mk!(_mm_add_epi8(a: __m128i, b: __m128i));
+mk!(_mm_add_epi16(a: __m128i, b: __m128i));
+mk!(_mm_add_epi32(a: __m128i, b: __m128i));
+mk!(_mm_add_epi64(a: __m128i, b: __m128i));
+mk!(_mm_adds_epi8(a: __m128i, b: __m128i));
+mk!(_mm_adds_epi16(a: __m128i, b: __m128i));
+mk!(_mm_adds_epu8(a: __m128i, b: __m128i));
+mk!(_mm_adds_epu16(a: __m128i, b: __m128i));
+mk!(_mm_avg_epu8(a: __m128i, b: __m128i));
+mk!(_mm_avg_epu16(a: __m128i, b: __m128i));
+mk!(_mm_madd_epi16(a: __m128i, b: __m128i));
+mk!(_mm_max_epi16(a: __m128i, b: __m128i));
+mk!(_mm_max_epu8(a: __m128i, b: __m128i));
+mk!(_mm_min_epi16(a: __m128i, b: __m128i));
+mk!(_mm_min_epu8(a: __m128i, b: __m128i));
+mk!(_mm_mulhi_epi16(a: __m128i, b: __m128i));
+mk!(_mm_mulhi_epu16(a: __m128i, b: __m128i));
+mk!(_mm_mullo_epi16(a: __m128i, b: __m128i));
+mk!(_mm_mul_epu32(a: __m128i, b: __m128i));
+mk!(_mm_sad_epu8(a: __m128i, b: __m128i));
+mk!(_mm_sub_epi8(a: __m128i, b: __m128i));
+mk!(_mm_sub_epi16(a: __m128i, b: __m128i));
+mk!(_mm_sub_epi32(a: __m128i, b: __m128i));
+mk!(_mm_sub_epi64(a: __m128i, b: __m128i));
+mk!(_mm_subs_epi8(a: __m128i, b: __m128i));
+mk!(_mm_subs_epi16(a: __m128i, b: __m128i));
+mk!(_mm_subs_epu8(a: __m128i, b: __m128i));
+mk!(_mm_subs_epu16(a: __m128i, b: __m128i));
+
+mk!([100]_mm_slli_si128{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: __m128i));
+mk!([100]_mm_bslli_si128{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: __m128i));
+mk!([100]_mm_bsrli_si128{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: __m128i));
+mk!([100]_mm_slli_epi16{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: __m128i));
+
+mk!([100]_mm_sll_epi16(a: __m128i, count: __m128i));
+
+mk!([100]_mm_slli_epi32{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: __m128i));
+
+mk!([100]_mm_sll_epi32(a: __m128i, count: __m128i));
+
+mk!([100]_mm_slli_epi64{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: __m128i));
+
+mk!([100]_mm_sll_epi64(a: __m128i, count: __m128i));
+
+mk!([100]_mm_srai_epi16{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: __m128i));
+
+mk!([100]_mm_sra_epi16(a: __m128i, count: __m128i));
+
+mk!([100]_mm_srai_epi32{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: __m128i));
+
+mk!([100]_mm_sra_epi32(a: __m128i, count: __m128i));
+mk!([100]_mm_srli_si128{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: __m128i));
+mk!([100]_mm_srli_epi16{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: __m128i));
+
+mk!([100]_mm_srl_epi16(a: __m128i, count: __m128i));
+
+mk!([100]_mm_srli_epi32{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: __m128i));
+
+mk!([100]_mm_srl_epi32(a: __m128i, count: __m128i));
+
+mk!([100]_mm_srli_epi64{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: __m128i));
+
+mk!(_mm_srl_epi64(a: __m128i, count: __m128i));
+mk!(_mm_and_si128(a: __m128i, b: __m128i));
+mk!(_mm_andnot_si128(a: __m128i, b: __m128i));
+mk!(_mm_or_si128(a: __m128i, b: __m128i));
+mk!(_mm_xor_si128(a: __m128i, b: __m128i));
+mk!(_mm_cmpeq_epi8(a: __m128i, b: __m128i));
+mk!(_mm_cmpeq_epi16(a: __m128i, b: __m128i));
+mk!(_mm_cmpeq_epi32(a: __m128i, b: __m128i));
+mk!(_mm_cmpgt_epi8(a: __m128i, b: __m128i));
+mk!(_mm_cmpgt_epi16(a: __m128i, b: __m128i));
+mk!(_mm_cmpgt_epi32(a: __m128i, b: __m128i));
+mk!(_mm_cmplt_epi8(a: __m128i, b: __m128i));
+mk!(_mm_cmplt_epi16(a: __m128i, b: __m128i));
+mk!(_mm_cmplt_epi32(a: __m128i, b: __m128i));
+mk!(_mm_cvtsi32_si128(a: i32));
+
+// mk!(_mm_cvtsi128_si32(a: __m128i));
+
+mk!(_mm_set_epi64x(e1: i64, e0: i64));
+mk!(_mm_set_epi32(e3: i32, e2: i32, e1: i32, e0: i32));
+mk!(_mm_set_epi16(
+    e7: i16,
+    e6: i16,
+    e5: i16,
+    e4: i16,
+    e3: i16,
+    e2: i16,
+    e1: i16,
+    e0: i16
+));
+mk!(_mm_set_epi8(
+    e15: i8,
+    e14: i8,
+    e13: i8,
+    e12: i8,
+    e11: i8,
+    e10: i8,
+    e9: i8,
+    e8: i8,
+    e7: i8,
+    e6: i8,
+    e5: i8,
+    e4: i8,
+    e3: i8,
+    e2: i8,
+    e1: i8,
+    e0: i8
+));
+mk!(_mm_set1_epi64x(a: i64));
+mk!(_mm_set1_epi32(a: i32));
+mk!(_mm_set1_epi16(a: i16));
+mk!(_mm_set1_epi8(a: i8));
+mk!(_mm_setr_epi32(e3: i32, e2: i32, e1: i32, e0: i32));
+mk!(_mm_setr_epi16(
+    e7: i16,
+    e6: i16,
+    e5: i16,
+    e4: i16,
+    e3: i16,
+    e2: i16,
+    e1: i16,
+    e0: i16
+));
+mk!(_mm_setr_epi8(
+    e15: i8,
+    e14: i8,
+    e13: i8,
+    e12: i8,
+    e11: i8,
+    e10: i8,
+    e9: i8,
+    e8: i8,
+    e7: i8,
+    e6: i8,
+    e5: i8,
+    e4: i8,
+    e3: i8,
+    e2: i8,
+    e1: i8,
+    e0: i8
+));
+mk!(_mm_setzero_si128());
+mk!(_mm_move_epi64(a: __m128i));
+mk!(_mm_packs_epi16(a: __m128i, b: __m128i));
+mk!(_mm_packs_epi32(a: __m128i, b: __m128i));
+mk!(_mm_packus_epi16(a: __m128i, b: __m128i));
+
+// mk!([100]_mm_extract_epi16(a: __m128i));
+mk!([100]_mm_insert_epi16{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>}(a: __m128i, i: i32));
+
+// mk!([100]_mm_movemask_epi8(a: __m128i));
+
+mk!([100]_mm_shuffle_epi32{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: __m128i));
+mk!([100]_mm_shufflehi_epi16{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: __m128i));
+mk!([100]_mm_shufflelo_epi16{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: __m128i));
+mk!(_mm_unpackhi_epi8(a: __m128i, b: __m128i));
+mk!(_mm_unpackhi_epi16(a: __m128i, b: __m128i));
+mk!(_mm_unpackhi_epi32(a: __m128i, b: __m128i));
+mk!(_mm_unpackhi_epi64(a: __m128i, b: __m128i));
+mk!(_mm_unpacklo_epi8(a: __m128i, b: __m128i));
+mk!(_mm_unpacklo_epi16(a: __m128i, b: __m128i));
+mk!(_mm_unpacklo_epi32(a: __m128i, b: __m128i));
+mk!(_mm_unpacklo_epi64(a: __m128i, b: __m128i));
+mk!(_mm_undefined_si128());
diff --git a/testable-simd-models/src/core_arch/x86/tests/ssse3.rs b/testable-simd-models/src/core_arch/x86/tests/ssse3.rs
new file mode 100644
index 0000000000000..6382f953f2063
--- /dev/null
+++ b/testable-simd-models/src/core_arch/x86/tests/ssse3.rs
@@ -0,0 +1,51 @@
+use super::types::*;
+use super::upstream;
+use crate::abstractions::bitvec::BitVec;
+use crate::helpers::test::HasRandom;
+
+/// Derives tests for a given intrinsics. Test that a given intrinsics and its model compute the same thing over random values (1000 by default).
+macro_rules! mk {
+    ($([$N:literal])?$name:ident$({$(<$($c:literal),*>),*})?($($x:ident : $ty:ident),*)) => {
+        #[test]
+        fn $name() {
+            #[allow(unused)]
+            const N: usize = {
+                let n: usize = 1000;
+                $(let n: usize = $N;)?
+                    n
+            };
+            mk!(@[N]$name$($(<$($c),*>)*)?($($x : $ty),*));
+        }
+    };
+    (@[$N:ident]$name:ident$(<$($c:literal),*>)?($($x:ident : $ty:ident),*)) => {
+        for _ in 0..$N {
+            $(let $x = $ty::random();)*
+                assert_eq!(super::super::models::ssse3::$name$(::<$($c,)*>)?($($x.into(),)*), unsafe {
+                    BitVec::from(upstream::$name$(::<$($c,)*>)?($($x.into(),)*)).into()
+                });
+        }
+    };
+    (@[$N:ident]$name:ident<$($c1:literal),*>$(<$($c:literal),*>)*($($x:ident : $ty:ident),*)) => {
+        let one = || {
+            mk!(@[$N]$name<$($c1),*>($($x : $ty),*));
+        };
+        one();
+        mk!(@[$N]$name$(<$($c),*>)*($($x : $ty),*));
+    }
+}
+mk!(_mm_abs_epi8(a: __m128i));
+mk!(_mm_abs_epi16(a: __m128i));
+mk!(_mm_abs_epi32(a: __m128i));
+mk!(_mm_shuffle_epi8(a: __m128i, b: __m128i));
+mk!([100]_mm_alignr_epi8{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: __m128i, b: __m128i));
+mk!(_mm_hadd_epi16(a: __m128i, b: __m128i));
+mk!(_mm_hadds_epi16(a: __m128i, b: __m128i));
+mk!(_mm_hadd_epi32(a: __m128i, b: __m128i));
+mk!(_mm_hsub_epi16(a: __m128i, b: __m128i));
+mk!(_mm_hsubs_epi16(a: __m128i, b: __m128i));
+mk!(_mm_hsub_epi32(a: __m128i, b: __m128i));
+mk!(_mm_maddubs_epi16(a: __m128i, b: __m128i));
+mk!(_mm_mulhrs_epi16(a: __m128i, b: __m128i));
+mk!(_mm_sign_epi8(a: __m128i, b: __m128i));
+mk!(_mm_sign_epi16(a: __m128i, b: __m128i));
+mk!(_mm_sign_epi32(a: __m128i, b: __m128i));
diff --git a/testable-simd-models/src/helpers.rs b/testable-simd-models/src/helpers.rs
new file mode 100644
index 0000000000000..6c5e84e2a8dbd
--- /dev/null
+++ b/testable-simd-models/src/helpers.rs
@@ -0,0 +1,55 @@
+#[cfg(test)]
+pub mod test {
+    use crate::abstractions::{bit::Bit, bitvec::BitVec, funarr::FunArray};
+    use rand::prelude::*;
+
+    /// Helper trait to generate random values
+    pub trait HasRandom {
+        fn random() -> Self;
+    }
+    macro_rules! mk_has_random {
+        ($($ty:ty),*) => {
+            $(impl HasRandom for $ty {
+                fn random() -> Self {
+                    let mut rng = rand::rng();
+                    rng.random()
+                }
+            })*
+        };
+    }
+
+    mk_has_random!(bool);
+    mk_has_random!(i8, i16, i32, i64, i128);
+    mk_has_random!(u8, u16, u32, u64, u128);
+
+    impl HasRandom for isize {
+        fn random() -> Self {
+            i128::random() as isize
+        }
+    }
+    impl HasRandom for usize {
+        fn random() -> Self {
+            i128::random() as usize
+        }
+    }
+
+    impl HasRandom for Bit {
+        fn random() -> Self {
+            crate::abstractions::bit::Bit::from(bool::random())
+        }
+    }
+    impl<const N: u64> HasRandom for BitVec<N> {
+        fn random() -> Self {
+            Self::from_fn(|_| Bit::random())
+        }
+    }
+
+    impl<const N: u64, T: HasRandom> HasRandom for FunArray<N, T> {
+        fn random() -> Self {
+            FunArray::from_fn(|_| T::random())
+        }
+    }
+}
+
+#[cfg(test)]
+pub use test::*;
diff --git a/testable-simd-models/src/lib.rs b/testable-simd-models/src/lib.rs
new file mode 100644
index 0000000000000..fc76194526e20
--- /dev/null
+++ b/testable-simd-models/src/lib.rs
@@ -0,0 +1,35 @@
+//! `testable-simd-models`: A Rust Model for the `core` Library
+//!
+//! `testable-simd-models` is a simplified, self-contained model of Rust’s `core` library. It aims to provide
+//! a purely Rust-based specification of `core`'s fundamental operations, making them easier to
+//! understand, analyze, and formally verify. Unlike `core`, which may rely on platform-specific
+//! intrinsics and compiler magic, `core-models` expresses everything in plain Rust, prioritizing
+//! clarity and explicitness over efficiency.
+//!
+//! ## Key Features
+//!
+//! - **Partial Modeling**: `core-models` includes only a subset of `core`, focusing on modeling
+//!   fundamental operations rather than providing a complete replacement.
+//! - **Exact Signatures**: Any item that exists in both `core-models` and `core` has the same type signature,
+//!   ensuring compatibility with formal verification efforts.
+//! - **Purely Functional Approach**: Where possible, `core-models` favors functional programming principles,
+//!   avoiding unnecessary mutation and side effects to facilitate formal reasoning.
+//! - **Explicit Implementations**: Even low-level operations, such as SIMD, are modeled explicitly using
+//!   Rust constructs like bit arrays and partial maps.
+//! - **Extra Abstractions**: `core-models` includes additional helper types and functions to support
+//!   modeling. These extra items are marked appropriately to distinguish them from `core` definitions.
+//!
+//! ## Intended Use
+//!
+//! `testable-simd-models` is designed as a reference model for formal verification and reasoning about Rust programs.
+//! By providing a readable, testable, well-specified version of `core`'s behavior, it serves as a foundation for
+//! proof assistants and other verification tools.
+
+// This recursion limit is necessary for mk! macro sued for tests.
+// We test functions with const generics, the macro generate a test per possible (const generic) control value.
+#![recursion_limit = "4096"]
+pub mod abstractions;
+pub mod core_arch;
+
+pub use core_arch as arch;
+pub mod helpers;
diff --git a/testable-simd-models/test.sh b/testable-simd-models/test.sh
new file mode 100755
index 0000000000000..8f521735122c3
--- /dev/null
+++ b/testable-simd-models/test.sh
@@ -0,0 +1,2 @@
+cross test --target aarch64-unknown-linux-gnu
+cross test --target x86_64-unknown-linux-gnu