-
Notifications
You must be signed in to change notification settings - Fork 285
Add SquareRoot and Logarithm to SVE microbenchmark #5021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @LoopedBard3 , here is another benchmark from the series of SVE benchmarks for review. Kindly take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds two new SVE (Scalable Vector Extension) benchmark files for Arm64 architecture: one for square root operations and one for logarithm operations. These benchmarks compare scalar implementations against Vector128 and SVE-based vectorized implementations.
- Implements SquareRoot benchmarks with scalar, Vector128, and two SVE variants
- Implements Logarithm benchmarks with scalar, Vector128, and SVE implementations using Arm's optimized-routines algorithm
- Both benchmarks include verification logic to ensure correctness of vectorized implementations
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/benchmarks/micro/sve/SquareRoot.cs | Adds benchmark comparing scalar, Vector128, and SVE square root implementations with tail handling strategies |
| src/benchmarks/micro/sve/Logarithm.cs | Adds benchmark for logarithm calculation using optimized polynomial approximation from Arm's optimized-routines library |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| { | ||
| // Since pLoop is a Vector<uint> predicate, we load the input as uint array, | ||
| // then cast it back to Vector<float>. | ||
| // This is preferrable to casting pLoop to Vector<float>, which would cause |
Copilot
AI
Oct 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'preferrable' to 'preferable'.
| // This is preferrable to casting pLoop to Vector<float>, which would cause | |
| // This is preferable to casting pLoop to Vector<float>, which would cause |
| // Since pLoop is a Vector<uint> predicate, we load the input as uint array, | ||
| // then cast it back to Vector<float>. | ||
| // This is preferrable to casting pLoop to Vector<float>, which would cause | ||
| // a unnecessary conversion from predicate to vector in the codegen. |
Copilot
AI
Oct 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected grammar: 'a unnecessary' should be 'an unnecessary'.
| // a unnecessary conversion from predicate to vector in the codegen. | |
| // an unnecessary conversion from predicate to vector in the codegen. |
| Vector128<uint> u = AdvSimd.And(u_off, Vector128.Create(0x007fffffu)); | ||
| u = AdvSimd.Add(u, offVec); | ||
|
|
||
| Vector128<float> r = Sve.Subtract(u.AsSingle(), Vector128.Create(1.0f)); |
Copilot
AI
Oct 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Sve.Subtract with Vector128 types is incorrect. Should use AdvSimd.Subtract instead to match the pattern used elsewhere in this method (lines 105, 114, 118) where AdvSimd operations are used for Vector128 types.
| Vector128<float> r = Sve.Subtract(u.AsSingle(), Vector128.Create(1.0f)); | |
| Vector128<float> r = AdvSimd.Subtract(u.AsSingle(), Vector128.Create(1.0f)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a fair question, @ylpoonlg is there a reason to choose one over another?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a mistake sorry. It only worked because Sve is a subclass of AdvSimd, but AdvSimd is the correct one.
This PR adds the following two SVE benchmarks:
SquareRoot
Results on Nvidia Grace
Logarithm
The algorithm is ported from: https://github.com/ARM-software/optimized-routines/blob/v25.07/math/aarch64/sve/logf.c.
The accuracy is around 3ULP.
@dotnet/arm64-contrib @SwapnilGaikwad