Skip to content

Examples and Tutorials

Nallani Bhaskar edited this page Mar 18, 2026 · 3 revisions

Examples and Tutorials

AOCL-DLP ships with example programs in the examples/classic/ directory. Build them with:

cd aocl-dlp
mkdir build && cd build
cmake -DBUILD_EXAMPLES=ON ..
make -j$(nproc)

Compiled examples are in build/examples/classic/.

Example Catalog

Basic GEMM

Example Description Key concepts
simple_gemm_f32.c Float32 matrix multiplication Basic GEMM call, row-major layout
simple_gemm_bf16.c BFloat16 GEMM BF16 input type, f32 accumulation
simple_gemm_s8.c Signed int8 GEMM Integer quantized GEMM

Mixed Precision

Example Description Key concepts
simple_gemm_bf16s8.c BF16 activations with int8 weights Mixed-precision, on-the-fly quantization
simple_gemm_f32s8.c F32 activations with int8 weights Mixed-precision quantized inference

Post-Operations

Example Description Key concepts
simple_gemm_with_bias.c GEMM with fused bias addition dlp_metadata_t, BIAS post-op
simple_gemm_with_relu.c GEMM with fused ReLU activation ELTWISE post-op, RELU
post_ops_combinations.c Multiple chained post-operations Chaining BIAS + ELTWISE, seq_vector

Batch & Advanced

Example Description Key concepts
batch_gemm.c Batch GEMM for multiple matrices aocl_batch_gemm_*, group_count
matrix_reorder.c Pre-reorder weights for repeated use aocl_reorder_*, mem_format_b = 'R'
quantization.c Symmetric quantization workflow DLP_SYMM_STAT_QUANT, sym_quant APIs
eltwise_ops.c Standalone element-wise operations aocl_gemm_eltwise_ops_*

Multi-Instance & Utilities

Example Description Key concepts
multi_instance_gemm_f32.c Multiple GEMM instances in parallel Thread-local settings, concurrent calls
multi_instance_gemm_u8s8.c Multi-instance quantized GEMM Parallel quantized inference
version.c Query library version dlp_version_query()

Suggested Learning Path

If you are new to AOCL-DLP, work through the examples in this order:

  1. Quick Start -- Build and run your first program (inline example)
  2. simple_gemm_f32.c -- Understand basic GEMM parameters
  3. simple_gemm_with_bias.c -- Learn how post-ops work
  4. matrix_reorder.c -- Optimize for repeated inference
  5. batch_gemm.c -- Process multiple matrices efficiently
  6. quantization.c -- Use integer quantization for inference

Then explore the guides for deeper understanding:

Building Examples Against an Installed Library

If AOCL-DLP is already installed on your system, you can build examples standalone:

# Using shared library
gcc -o simple_gemm_f32 simple_gemm_f32.c -I/usr/local/include -L/usr/local/lib -laocl-dlp -lm

# Using static library
gcc -o simple_gemm_f32 simple_gemm_f32.c -I/usr/local/include -L/usr/local/lib \
    -Wl,--whole-archive -laocl-dlp_static -Wl,--no-whole-archive -lstdc++ -lm -fopenmp

See the Integration Guide for CMake-based builds and troubleshooting.

Clone this wiki locally