-
Notifications
You must be signed in to change notification settings - Fork 4
Home
Nallani Bhaskar edited this page Mar 18, 2026
·
9 revisions
AOCL-DLP (AMD Optimizing CPU Libraries - Deep Learning Primitives) is a high-performance library providing optimized deep learning primitives for AMD processors. It implements GEMM operations for machine learning applications, supporting multiple data types, fused pre/post-operations, and batch processing -- all tuned to leverage AMD hardware capabilities including AVX2, AVX512, AVX512_VNNI, AVX512_BF16, and AVX512_FP16 instruction sets.
New here? Start with the Quick Start Guide to build, install, and run your first GEMM in 5 minutes.
- Quick Start -- Install, build your first program, and run it
- Integration Guide -- CMake packages, manual linking, static vs dynamic, troubleshooting
- Examples & Tutorials -- Annotated code examples for every feature
- Library Overview -- Architecture, components, data types, hardware abstraction
- GEMM Guide -- Data type combinations, memory layouts, matrix reordering, choosing the right variant
-
Post-Operations Guide -- Fused post-ops (BIAS, activations, SCALE, MATRIX_ADD/MUL) via
dlp_metadata_t - Eltwise Operations Guide -- Standalone element-wise operations (separate from GEMM post-ops)
- Quantization Guide -- Symmetric quantization, mixed-precision workflows, scale/zero-point setup
- API Lifecycle -- End-to-end flow: data prep, post-ops setup, compute, threading
- Performance Guide -- Threading, NUMA, memory layout, architecture-specific tips
-
Environment Variables -- Complete reference for
DLP_NUM_THREADS,AOCL_ENABLE_INSTRUCTIONS, OpenMP tuning
- DLP Testing -- Google Test framework, YAML configs, running and writing tests
- DLP Benchmarking -- Google Benchmark framework, YAML configs, performance analysis
- JIT Code Generation -- Just-In-Time compilation system, Xbyak assembler, kernel debugging
- FAQ -- Common questions about threading, linking, data types, and performance
- API Reference (Sphinx) -- Full generated API documentation
- README -- Feature summary and data type table
- BUILD.md -- Build configuration and CMake options
- INSTALL.md -- Installation steps
- Contributing -- How to contribute
- License -- BSD 3-Clause
Getting Started
User Guides
Performance & Config
Testing & Benchmarking
Developer Guides
Reference