Skip to content

Aarch64 simd #65

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 21 commits into
base: main
Choose a base branch
from
Draft

Aarch64 simd #65

wants to merge 21 commits into from

Conversation

Dr-Emann
Copy link
Collaborator

@Dr-Emann Dr-Emann commented May 21, 2025

Builds on #57

Closes #61

Currently still working, but opening as a draft for early looks

Dr-Emann and others added 21 commits October 17, 2023 21:33
Remove unneeded `extern crate`s
Also, add some benchmarks against memchr
Add a test that looks for the first item in a long haystack
The memmap crate is unmaintained, instead, use the maintained memmap2 crate
Structs don't need the bounds, only the implementations
Mostly just adding #[must_use]
This speeds up the criteron benchmarks by almost 2x

I believe this is needed because e.g. Bytes::find is inlined, and calls `find`
generically, which will call PackedCompareControl methods. So the code calling
the methods will be inlined into the calling crate, but the implemetations of
the PackedCompareControl are not accessable to the code in the calling crate,
so they will end up as actual function calls. However these functions are
_super_ simple, and inlining them helps a LOT, so adding `#[inline]` to these
functions, and making their implementation available to calling crates has a
huge effect.

This was only seen when moving to criterion because previously, nightly
benchmarks were implemented in the library crate itself, and so these functions
were already elegable for inlining. Criteron results were actually more
accurate to what callers of the crate would actually see!
Per suggestion from @BurntSushi [here](tafia/quick-xml#664 (comment))

On my M1, tt appears to be slower but competitive with memchr up to memchr3,
then start being the from 5-16
We may not want to be stuck with const-constructable implementations
Move the simd-only tests to the top level

This allows testing even when sse4.2 isn't enabled: when it is
available, it will still test the simd implementation, but will test the
fallback otherwise.
This moves mentions of "simd" to be x86 specific. Also, do everything
with #[cfg], rather than requiring custom cfgs populated in the build.rs
This includes pretty frequent instances
For aarch64, we can do quite a bit better than just calling the `find`
function repeatedly: we build a bitset of 64 bits where we've already
found if they match the set of bits we're looking for. We can then
efficently iterate over those set bits.

It may be possible to do something similar in the x86 simd
implementation.
@Dr-Emann
Copy link
Collaborator Author

rust-lang/rust#127481

Looks like the unstable Pattern api changed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

aarch64 simd implementation
2 participants