Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardware-accelerated inverse floating point square root #156

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Kuratius
Copy link
Contributor

@Kuratius Kuratius commented Jan 19, 2025

https://github.com/Aikku93 wrote a software divide that can keep up with the square root unit and wrote this implementation in assembly.
They asked me to make a pull request for it. Draft for now because I need to update the hw_math test also.

Some general information on the characteristics of this function:
Execution time seems to be about 37 bus cycles on dsi, 61-62 bus cycles on ds.
Accuracy should be within 1 ulp of the result you would get when doing the calculation with doubles and then casting to single precision, which should more accurate than a simple 1.0f/sqrtf(x).
The result is exact (well, identical) in ~89 % of possible mantissas.

Considerations: In this version the LUT is in main memory, but it may be desirable to place it in tcm on ds. I believe on dsi it doesnt matter where you put it because the overall execution time is limited by the hw square root. Alignment to cache lines could also be a consideration.

@Kuratius
Copy link
Contributor Author

Kuratius commented Jan 19, 2025

Apologies, this was not intended to be two commits.

@Kuratius
Copy link
Contributor Author

There's potentially a bug in the Inf or NaN branch; I'll add a test for it to the other PR and check if the fix suggested by aikku works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant