[Examples]: Add Intel AMX BF16 matrix multiplication. #547

Jiajun-Ji · 2025-09-12T08:40:02Z

Implements AMX tile operations with system permission handling and comprehensive benchmarking. Includes optimized linalg baseline, AOT compilation chain. Achieves ~2× speedup on Sapphire Rapids 8488C (512×2048×1024 matrices).
FileCheck testing is not supported because AMX tile operations require system-level permissions and AOT compilation; JIT-based testing frameworks like FileCheck cannot initialize AMX state or handle required syscalls.

examples/BuddyMatmul/amx-wrapper.c

Implements AMX tile operations with system permission handling and comprehensive benchmarking. Includes optimized linalg baseline, AOT compilation chain. Achieves ~2× speedup on Sapphire Rapids 8488C (512×2048×1024 matrices).

zhanghb97 reviewed Sep 17, 2025

View reviewed changes

examples/BuddyMatmul/amx-wrapper.c Show resolved Hide resolved

zhanghb97 added final review format issue labels Sep 17, 2025

Jiajun-Ji added 2 commits September 22, 2025 14:03

[Example]: Add Intel AMX BF16 matrix multiplication.

c8592f9

Implements AMX tile operations with system permission handling and comprehensive benchmarking. Includes optimized linalg baseline, AOT compilation chain. Achieves ~2× speedup on Sapphire Rapids 8488C (512×2048×1024 matrices).

format the code and add the licence header.

3a9b11a

Jiajun-Ji force-pushed the AMX-Optimize branch from da31331 to 3a9b11a Compare September 22, 2025 06:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Examples]: Add Intel AMX BF16 matrix multiplication. #547

[Examples]: Add Intel AMX BF16 matrix multiplication. #547

Uh oh!

Jiajun-Ji commented Sep 12, 2025

Uh oh!

Uh oh!

Uh oh!

[Examples]: Add Intel AMX BF16 matrix multiplication. #547

Are you sure you want to change the base?

[Examples]: Add Intel AMX BF16 matrix multiplication. #547

Uh oh!

Conversation

Jiajun-Ji commented Sep 12, 2025

Uh oh!

Uh oh!

Uh oh!