Skip to content
Nallani Bhaskar edited this page Mar 18, 2026 · 9 revisions

AOCL-DLP Documentation Hub

AOCL-DLP (AMD Optimizing CPU Libraries - Deep Learning Primitives) is a high-performance library providing optimized deep learning primitives for AMD processors. It implements GEMM operations for machine learning applications, supporting multiple data types, fused pre/post-operations, and batch processing -- all tuned to leverage AMD hardware capabilities including AVX2, AVX512, AVX512_VNNI, AVX512_BF16, and AVX512_FP16 instruction sets.

New here? Start with the Quick Start Guide to build, install, and run your first GEMM in 5 minutes.


Getting Started

User Guides

  • Library Overview -- Architecture, components, data types, hardware abstraction
  • GEMM Guide -- Data type combinations, memory layouts, matrix reordering, choosing the right variant
  • Post-Operations Guide -- Fused post-ops (BIAS, activations, SCALE, MATRIX_ADD/MUL) via dlp_metadata_t
  • Eltwise Operations Guide -- Standalone element-wise operations (separate from GEMM post-ops)
  • Quantization Guide -- Symmetric quantization, mixed-precision workflows, scale/zero-point setup
  • API Lifecycle -- End-to-end flow: data prep, post-ops setup, compute, threading

Performance & Configuration

Testing & Benchmarking

  • DLP Testing -- Google Test framework, YAML configs, running and writing tests
  • DLP Benchmarking -- Google Benchmark framework, YAML configs, performance analysis

Developer Guides

Reference

  • FAQ -- Common questions about threading, linking, data types, and performance
  • API Reference (Sphinx) -- Full generated API documentation

Project Links

Clone this wiki locally