Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vector registers to clobber list to prevent compiler optimization. #5203

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

vaiskv
Copy link
Contributor

@vaiskv vaiskv commented Apr 3, 2025

SME based SGEMMDIRECT kernel uses the vector registers (z) and adding clobber list informs compiler not to optimize these registers.

    SME based SGEMMDIRECT kernel uses the vector registers (z) and adding
    clobber list informs compiler not to optimize these registers.
@vaiskv
Copy link
Contributor Author

vaiskv commented Apr 3, 2025

Hi @martin-frbg,
This patch fixes an issue of #5084 where in O3 optimization compiler was directly loading values from z registers instead of reading from memory, which was causing undesirable output. This patch adds the vector registers to clobber list informing compiler not to optimize these registers.

@martin-frbg martin-frbg added this to the 0.3.30 milestone Apr 3, 2025
@vaiskv
Copy link
Contributor Author

vaiskv commented Apr 3, 2025

I came across an issue in the way the library is compiled and used in an application.

When the library is compiled with TARGET=ARMV8 DYNAMIC_ARCH=1 and run on a target with SME feature, the library is hitting an exception with an error "OpenBLAS: Architecture Initialization failed. No initialization function found"

But if the library is compiled with TARGET=ARMV9SME DYNAMIC_ARCH=1, the application works and call to cblas_sgemm gets directed to SME based sgemm implementation.

Upon debugging, gotoblas_dynamic_init is populating gotoblas with gotoblas_ARMV9SME, but in the case where the libary is failing, the init function is NULL. But when compiled with ARMV9SME, init function is not NULL and is working fine.

Our requirement is use a common library for all the targets and if the target supports SME, cblas_sgemm should direct to SME implementation.

Is there something missing in the integration? Please let me know if any changes are needed.

@martin-frbg
Copy link
Collaborator

huh, looks like cpu type autodetection for ARMV9SME went missing in dynamic_arm64.c - at least I'm fairly sure we had it already - basically like it's done for ARMV8SVE but using the support_sme1() function

@vaiskv
Copy link
Contributor Author

vaiskv commented Apr 4, 2025

I added the auto detection for sme by referring to this commit. Still the issue persists :(

@martin-frbg
Copy link
Collaborator

Are you testing with an Apple cpu, or with something else ?

@vaiskv
Copy link
Contributor Author

vaiskv commented Apr 4, 2025

I am testing on QEMU with SME enabled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants