Skip to content

Conversation

@ylpoonlg
Copy link
Contributor

This PR adds the following two SVE benchmarks:

SquareRoot

Results on Nvidia Grace

Method Size Mean Error StdDev Median Min Max Allocated
Scalar 15 14.356 ns 0.0053 ns 0.0047 ns 14.354 ns 14.349 ns 14.366 ns -
Vector128SquareRoot 15 5.285 ns 0.1041 ns 0.0974 ns 5.276 ns 5.177 ns 5.498 ns -
SveSquareRoot 15 2.717 ns 0.0254 ns 0.0225 ns 2.720 ns 2.671 ns 2.743 ns -
SveTail 15 5.406 ns 0.0170 ns 0.0159 ns 5.400 ns 5.393 ns 5.441 ns -
Scalar 127 132.888 ns 0.3102 ns 0.2901 ns 132.783 ns 132.602 ns 133.404 ns -
Vector128SquareRoot 127 35.431 ns 0.0305 ns 0.0286 ns 35.419 ns 35.402 ns 35.478 ns -
SveSquareRoot 127 32.119 ns 0.0049 ns 0.0041 ns 32.120 ns 32.114 ns 32.125 ns -
SveTail 127 35.485 ns 0.0202 ns 0.0189 ns 35.479 ns 35.467 ns 35.529 ns -
Scalar 527 557.405 ns 0.3753 ns 0.3511 ns 557.235 ns 557.096 ns 558.034 ns -
Vector128SquareRoot 527 141.569 ns 0.0777 ns 0.0727 ns 141.537 ns 141.494 ns 141.729 ns -
SveSquareRoot 527 138.183 ns 0.0420 ns 0.0372 ns 138.173 ns 138.143 ns 138.268 ns -
SveTail 527 141.521 ns 0.0509 ns 0.0451 ns 141.503 ns 141.473 ns 141.610 ns -
Scalar 10015 10,589.862 ns 6.5392 ns 5.1054 ns 10,587.659 ns 10,586.120 ns 10,601.373 ns -
Vector128SquareRoot 10015 2,649.680 ns 0.9145 ns 0.8107 ns 2,649.360 ns 2,648.837 ns 2,651.312 ns -
SveSquareRoot 10015 2,648.157 ns 0.6936 ns 0.6149 ns 2,648.013 ns 2,647.016 ns 2,649.408 ns -
SveTail 10015 2,650.875 ns 0.7506 ns 0.6268 ns 2,650.927 ns 2,649.866 ns 2,652.168 ns -

Logarithm

The algorithm is ported from: https://github.com/ARM-software/optimized-routines/blob/v25.07/math/aarch64/sve/logf.c.
The accuracy is around 3ULP.

Method Size Mean Error StdDev Median Min Max Allocated
Scalar 15 48.38 ns 0.241 ns 0.188 ns 48.45 ns 47.96 ns 48.57 ns -
Vector128Logarithm 15 17.19 ns 0.021 ns 0.019 ns 17.19 ns 17.16 ns 17.22 ns -
SveLogarithm 15 20.06 ns 0.050 ns 0.046 ns 20.07 ns 19.99 ns 20.12 ns -
Scalar 127 403.73 ns 3.722 ns 3.299 ns 401.57 ns 401.29 ns 410.48 ns -
Vector128Logarithm 127 92.86 ns 0.038 ns 0.034 ns 92.84 ns 92.82 ns 92.93 ns -
SveLogarithm 127 104.63 ns 0.165 ns 0.154 ns 104.64 ns 104.27 ns 104.86 ns -
Scalar 527 1,661.68 ns 2.010 ns 1.570 ns 1,661.00 ns 1,660.82 ns 1,666.26 ns -
Vector128Logarithm 527 359.11 ns 0.301 ns 0.281 ns 358.92 ns 358.80 ns 359.56 ns -
SveLogarithm 527 399.49 ns 1.053 ns 0.985 ns 399.69 ns 397.88 ns 401.24 ns -
Scalar 10015 31,281.10 ns 43.606 ns 38.656 ns 31,271.90 ns 31,240.84 ns 31,367.99 ns -
Vector128Logarithm 10015 6,708.43 ns 2.011 ns 1.783 ns 6,707.70 ns 6,705.88 ns 6,712.16 ns -
SveLogarithm 10015 7,845.96 ns 16.922 ns 15.829 ns 7,843.81 ns 7,818.20 ns 7,875.49 ns -

@dotnet/arm64-contrib @SwapnilGaikwad

@ylpoonlg ylpoonlg marked this pull request as ready for review October 27, 2025 15:33
@SwapnilGaikwad
Copy link

Hi @LoopedBard3 , here is another benchmark from the series of SVE benchmarks for review. Kindly take a look.
Thanks.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds two new SVE (Scalable Vector Extension) benchmark files for Arm64 architecture: one for square root operations and one for logarithm operations. These benchmarks compare scalar implementations against Vector128 and SVE-based vectorized implementations.

  • Implements SquareRoot benchmarks with scalar, Vector128, and two SVE variants
  • Implements Logarithm benchmarks with scalar, Vector128, and SVE implementations using Arm's optimized-routines algorithm
  • Both benchmarks include verification logic to ensure correctness of vectorized implementations

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/benchmarks/micro/sve/SquareRoot.cs Adds benchmark comparing scalar, Vector128, and SVE square root implementations with tail handling strategies
src/benchmarks/micro/sve/Logarithm.cs Adds benchmark for logarithm calculation using optimized polynomial approximation from Arm's optimized-routines library

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

{
// Since pLoop is a Vector<uint> predicate, we load the input as uint array,
// then cast it back to Vector<float>.
// This is preferrable to casting pLoop to Vector<float>, which would cause
Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'preferrable' to 'preferable'.

Suggested change
// This is preferrable to casting pLoop to Vector<float>, which would cause
// This is preferable to casting pLoop to Vector<float>, which would cause

Copilot uses AI. Check for mistakes.
// Since pLoop is a Vector<uint> predicate, we load the input as uint array,
// then cast it back to Vector<float>.
// This is preferrable to casting pLoop to Vector<float>, which would cause
// a unnecessary conversion from predicate to vector in the codegen.
Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected grammar: 'a unnecessary' should be 'an unnecessary'.

Suggested change
// a unnecessary conversion from predicate to vector in the codegen.
// an unnecessary conversion from predicate to vector in the codegen.

Copilot uses AI. Check for mistakes.
Vector128<uint> u = AdvSimd.And(u_off, Vector128.Create(0x007fffffu));
u = AdvSimd.Add(u, offVec);

Vector128<float> r = Sve.Subtract(u.AsSingle(), Vector128.Create(1.0f));
Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Sve.Subtract with Vector128 types is incorrect. Should use AdvSimd.Subtract instead to match the pattern used elsewhere in this method (lines 105, 114, 118) where AdvSimd operations are used for Vector128 types.

Suggested change
Vector128<float> r = Sve.Subtract(u.AsSingle(), Vector128.Create(1.0f));
Vector128<float> r = AdvSimd.Subtract(u.AsSingle(), Vector128.Create(1.0f));

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a fair question, @ylpoonlg is there a reason to choose one over another?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a mistake sorry. It only worked because Sve is a subclass of AdvSimd, but AdvSimd is the correct one.

LoopedBard3
LoopedBard3 previously approved these changes Oct 29, 2025
@LoopedBard3 LoopedBard3 merged commit 4aa9b56 into dotnet:main Oct 30, 2025
73 of 80 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants