frame pointer support to FP32 GEMM/GEMV JIT kernels for profiling#9
Open
frame pointer support to FP32 GEMM/GEMV JIT kernels for profiling#9
Conversation
Contributor
Author
|
I have submitted a PR for xbyak(herumi/xbyak#249). I may rollback to the initial approach(3982773) and update the dependency after it is merged. |
Contributor
Author
|
The latest xbyak dev branch seems to work perfectly (herumi/xbyak#250). I'm going to use the initial approach. |
dca3749 to
52ad5c9
Compare
Contributor
Author
|
Update: I have upgraded xbyak library to v7.36.1 which supports the |
Enable RBP as frame pointer in F32 GEMM, GEMM_RD, and GEMV JIT generators via UseRBPAsFramePointer to support accurate profiling and stack unwinding. Reduces available temp registers from 13 to 12, resolved by aliasing registers with non-overlapping lifetimes and spilling regIncK to the stack in GEMVM1.
52ad5c9 to
6f63112
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
AOCL-DLP JIT-generated kernels use Xbyak's StackFrame utility which allocates RBP as a general-purpose scratch register instead of setting up a frame pointer chain. This prevents any frame-pointer based perf profiling tools from unwinding through JIT code and get the cpu profiling.
This PR proposes a change to support frame pointer unwinding in FP32 GEMM/GEMV by avoiding the usage of rbp register and fixing the prologues&epilogues accordingly.
Update: I have upgraded xbyak library to v7.36.1 which supports the
UseRBPAsFramePointeroption.