Conversation
|
Heyya @statusfailed thanks for the PR! I don't mind if you used CODEX to help diagnose and fix the issue. I'll be happy to review this shortly! |
|
Hello @statusfailed if you have some time is it possible if you rebase against the master branch? |
- Added uint32_t _pad0 after float a.
- Added uint32_t _pad1 after uint32_t n.
- Initialized the new padding fields to 0:
- Kept existing logical arguments unchanged (y, x, a, n), but fixed their byte layout so:
- n lands at byte offset 24.
- total copied kernarg size is 32 bytes (matching runtime-reported kernarg=32).
- Set ENABLE_SGPR_DISPATCH_PTR (bit 1). - Set ENABLE_SGPR_KERNARG_SEGMENT_PTR (bit 3). - Set IS_PTR64 (bit 19). - Set ENABLE_WAVEFRONT_SIZE32 (bit 10) for non-CDNA targets.
- Switched ELF type to `ET_DYN` for code objects. - Added program headers (`PT_LOAD` for .text, `PT_NOTE` for metadata note). - Set valid program-header table fields `(e_phoff, e_phentsize, e_phnum)`. - Set AMDGPU HSA ABI version byte (`e_ident[EI_ABIVERSION] = 1`, HSA V3-compatible).
|
Something in master has broken the patch, I'm not sure what yet. I'm getting errors like this: Not exactly sure if they're related |
- Add hidden dispatch kernarg fields (block_count_*, group_size_*) in launch_saxpy and populate them from launch dims. - Make isel SGPR layout conditional: use dispatch + kernarg SGPR pairs when needs_dispatch=1, otherwise keep kernarg-only layout.
32857b9 to
a1ce1bb
Compare
|
OK reworked it a bit and it works again. I get some warnings though, and had to modify launch_saxpy.c to include block count / group size hidden dispatch args. |
|
Hey @statusfailed, cheers for getting saxpy going on GFX1151 and for sticking with the rebase, that's never fun. Had a proper look through and there are a few things I want to flag before we merge anything. The TGID bit positions are the scary one. The PR puts CDNA on bits 11/12/13 and RDNA on 7/8/9, but the LLVM spec (AMDHSAKernelDescriptor.h) says WORKGROUP_ID_X/Y/Z are at bits 7/8/9 for everything from GFX6 to GFX120, no CDNA special case. Bits 11:12 are actually VGPR_WORKITEM_ID, so we'd be stomping on that instead of enabling workgroup IDs. We have 8/8 tests passing on MI300X right now with 7/8/9 and I'd rather not find out what happens when we stop telling the hardware where the workgroups are. The kernel_code_properties widening to uint32_t is also dodgy. AMD defines it as uint16_t, and the IS_PTR64 bit at position 19 doesn't exist in the AMDHSA kernel descriptor, it's from the older amd_kernel_code_t format that predates Code Object V3. We'd be scribbling into reserved bytes which is the kind of thing that works until it very suddenly doesn't. The WAVEFRONT_SIZE32 at bit 10 is legit though and we should have that for RDNA. On the ABI version going from 4 down to 1, that drops us from Code Object V6 to V3, but the msgpack metadata still says amdhsa.version [1, 2] which is V5. Admittedly master already has a mismatch here (V6 ABI with V5 metadata, my fault, not yours) and we get away with it because the ROCm runtime is pretty relaxed about the whole thing. But V3 with V5 hidden args is pushing our luck. My gut feeling is the actual GFX1151 problem was probably the missing WAVEFRONT_SIZE32 bit combined with the ABI version mismatch on master confusing the runtime's hidden arg population. Might be worth trying just the wavefront fix with a cleaned-up ABI version before we rearchitect the SGPR layout around dispatch_ptr. The flake.nix is great, happy to take that in a contributors repo if you want to split it out. |
|
Hey @statusfailed I've added better error messaging and handling so now whenever you experience a bug or an issue it'll produce a dump of the state of the compiler as it happened. Should help with the dev experience / if you recreate the bug a dump should be produced. If you need help managing the conflicts in the emitter please feel free to sing out. |
This patch was written by codex, so I'm not necessarily expecting a merge. But hopefully it helps!
Here's a summary (written by me) of the issues and fixes.
(XREF: Issue #39)
Problem 1: hsaco loading fails
First problem: unable to load the hsaco file:
Fix is in
eb04b1b7ecebe5e53794090bf3ad9e92d8eb1ab3, adds a couple of ELF segments and changes ELF header to include:Warning: it looks like the e_ident change might break other platforms. I'm not sure if this can be merged as-is.
Problem 2: memory aperture violation
With previous fix applied, we unlock a new error:
The fix for this is in
2119a9bc9d2981b12a1b2e76ac4af8783a7f06b0.tl;dr: sets some additional properties in the
amd_kernel_descriptor_t.Problem 3: test fails
Last problem, saxpy test fails with mismatches:
This is fixed by
32857b901393809ffa15fcbc136ba69a9b83b81d.tl;dr: adds some padding to the
argsstruct inlaunch_saxpy.cto (apparently) obey AMDHSA ABI.Result
With all these patches applied, I now see a working result: