Skip to content

frame pointer support to FP32 GEMM/GEMV JIT kernels for profiling#9

Open
js00070 wants to merge 2 commits intoamd:devfrom
js00070:zhiyi/stackframe_change
Open

frame pointer support to FP32 GEMM/GEMV JIT kernels for profiling#9
js00070 wants to merge 2 commits intoamd:devfrom
js00070:zhiyi/stackframe_change

Conversation

@js00070
Copy link
Copy Markdown
Contributor

@js00070 js00070 commented Apr 8, 2026

AOCL-DLP JIT-generated kernels use Xbyak's StackFrame utility which allocates RBP as a general-purpose scratch register instead of setting up a frame pointer chain. This prevents any frame-pointer based perf profiling tools from unwinding through JIT code and get the cpu profiling.

This PR proposes a change to support frame pointer unwinding in FP32 GEMM/GEMV by avoiding the usage of rbp register and fixing the prologues&epilogues accordingly.

Update: I have upgraded xbyak library to v7.36.1 which supports the UseRBPAsFramePointer option.

@js00070
Copy link
Copy Markdown
Contributor Author

js00070 commented Apr 9, 2026

I have submitted a PR for xbyak(herumi/xbyak#249). I may rollback to the initial approach(3982773) and update the dependency after it is merged.

@js00070
Copy link
Copy Markdown
Contributor Author

js00070 commented Apr 15, 2026

The latest xbyak dev branch seems to work perfectly (herumi/xbyak#250). I'm going to use the initial approach.

@js00070 js00070 force-pushed the zhiyi/stackframe_change branch from dca3749 to 52ad5c9 Compare April 15, 2026 20:34
@js00070 js00070 changed the title [draft] frame pointer support to FP32 GEMM/GEMV JIT kernels for profiling frame pointer support to FP32 GEMM/GEMV JIT kernels for profiling Apr 15, 2026
@js00070
Copy link
Copy Markdown
Contributor Author

js00070 commented Apr 15, 2026

Update: I have upgraded xbyak library to v7.36.1 which supports the UseRBPAsFramePointer option.

js00070 added 2 commits April 20, 2026 19:26
Enable RBP as frame pointer in F32 GEMM, GEMM_RD, and GEMV JIT
generators via UseRBPAsFramePointer to support accurate profiling
and stack unwinding. Reduces available temp registers from 13 to 12,
resolved by aliasing registers with non-overlapping lifetimes and
spilling regIncK to the stack in GEMVM1.
@js00070 js00070 force-pushed the zhiyi/stackframe_change branch from 52ad5c9 to 6f63112 Compare April 21, 2026 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant