Post Ops Guide

Post-Operations Guide

AOCL-DLP can fuse common operations (bias addition, activations, scaling, matrix arithmetic) directly into GEMM computation. This avoids separate passes over the output matrix and reduces memory traffic.

The effective computation becomes:

C = post_ops( alpha * op(A) * op(B) + beta * C )

The `dlp_metadata_t` Structure

All post-operations are configured through a single dlp_metadata_t struct passed as the last argument to any GEMM function. Pass NULL when no post-ops are needed.

#include <aocl_dlp.h>

dlp_metadata_t meta = {0};  // zero-initialize

// Configure the post-op sequence (see below)
// ...

aocl_gemm_f32f32f32of32('R', 'N', 'N', m, n, k,
    1.0f, a, lda, 'N', b, ldb, 'N',
    0.0f, c, ldc, &meta);

Key Fields

Field	Type	Description
`seq_length`	`md_t`	Number of post-operations to apply
`seq_vector`	`DLP_POST_OP_TYPE*`	Array defining the order of post-ops
`bias`	`dlp_post_op_bias*`	Bias parameters (when BIAS is in sequence)
`eltwise`	`dlp_post_op_eltwise*`	Eltwise/activation parameters
`scale`	`dlp_scale_t*`	Scale + zero-point parameters
`matrix_add`	`dlp_post_op_matrix_add*`	Matrix addition parameters
`matrix_mul`	`dlp_post_op_matrix_mul*`	Matrix multiplication parameters
`num_eltwise`	`md_t`	Number of eltwise operations (when multiple are chained)

Execution Order

Post-ops execute in the order defined by seq_vector. For example, if seq_vector = {BIAS, ELTWISE}, bias is applied first, then the activation.

Post-Op Types

BIAS -- Add a Bias Vector

Adds a 1D bias vector (length n) to each row of the output matrix.

// Bias vector (one value per output column)
float bias_values[N] = { /* ... */ };

dlp_post_op_bias bias_op = {
    .bias      = bias_values,
    .stor_type = DLP_F32,     // data type of bias values
    .sf        = NULL,        // scale factor (NULL if not needed)
    .zp        = NULL         // zero point (NULL if not needed)
};

DLP_POST_OP_TYPE seq[] = { BIAS };

dlp_metadata_t meta = {0};
meta.seq_length = 1;
meta.seq_vector = seq;
meta.bias       = &bias_op;

aocl_gemm_f32f32f32of32('R', 'N', 'N', m, n, k,
    1.0f, a, lda, 'N', b, ldb, 'N',
    0.0f, c, ldc, &meta);
// Result: C[i][j] = (A * B)[i][j] + bias[j]

ELTWISE -- Activation Functions

Applies an element-wise activation function to the output.

// RELU example
dlp_post_op_eltwise eltwise_op = {
    .sf   = NULL,
    .algo = {
        .alpha     = NULL,        // unused for RELU
        .beta      = NULL,        // unused for RELU
        .algo_type = RELU,
        .stor_type = DLP_F32
    }
};

DLP_POST_OP_TYPE seq[] = { ELTWISE };

dlp_metadata_t meta = {0};
meta.seq_length  = 1;
meta.seq_vector  = seq;
meta.eltwise     = &eltwise_op;
meta.num_eltwise = 1;

Supported activation functions:

`DLP_ELT_ALGO_TYPE`	Formula	Parameters
`RELU`	`max(0, x)`	None
`PRELU`	`x >= 0 ? x : alpha * x`	`alpha`: leak factor
`GELU_TANH`	GELU with tanh approximation	None
`GELU_ERF`	GELU with erf approximation	None
`CLIP`	`clamp(x, alpha, beta)`	`alpha`: min, `beta`: max
`SWISH`	`x * sigmoid(alpha * x)`	`alpha`: scaling
`TANH`	`tanh(x)`	None
`SIGMOID`	`1 / (1 + exp(-x))`	None

PRELU example with alpha parameter:

float alpha_val = 0.01f;

dlp_post_op_eltwise prelu_op = {
    .sf   = NULL,
    .algo = {
        .alpha     = &alpha_val,
        .beta      = NULL,
        .algo_type = PRELU,
        .stor_type = DLP_F32
    }
};

CLIP example with min/max:

float clip_min = -1.0f;
float clip_max = 1.0f;

dlp_post_op_eltwise clip_op = {
    .sf   = NULL,
    .algo = {
        .alpha     = &clip_min,
        .beta      = &clip_max,
        .algo_type = CLIP,
        .stor_type = DLP_F32
    }
};

SCALE -- Scaling and Zero-Point

Applies per-channel or per-tensor scaling to the output, with optional zero-point offset.

float scale_vals[] = { 0.5f, 0.5f, /* ... one per column */ };
dlp_sf_t sf = {
    .scale_factor     = scale_vals,
    .scale_factor_len = n,           // per-channel (or 1 for per-tensor)
    .scale_factor_type = DLP_F32
};

dlp_scale_t scale_op = {
    .sf = &sf,
    .zp = NULL   // or provide a dlp_zp_t for zero-point
};

DLP_POST_OP_TYPE seq[] = { SCALE };

dlp_metadata_t meta = {0};
meta.seq_length = 1;
meta.seq_vector = seq;
meta.scale      = &scale_op;

MATRIX_ADD -- Element-wise Addition

Adds another matrix to the GEMM output, with optional scaling.

float residual[M * N] = { /* ... */ };

dlp_post_op_matrix_add add_op = {
    .matrix    = residual,
    .ldm       = n,           // leading dimension of the added matrix
    .stor_type = DLP_F32,
    .sf        = NULL         // optional scale factor
};

DLP_POST_OP_TYPE seq[] = { MATRIX_ADD };

dlp_metadata_t meta = {0};
meta.seq_length = 1;
meta.seq_vector = seq;
meta.matrix_add = &add_op;
// Result: C[i][j] = (A * B)[i][j] + residual[i][j]

MATRIX_MUL -- Element-wise Multiplication

Multiplies the GEMM output element-wise with another matrix.

float mask[M * N] = { /* ... */ };

dlp_post_op_matrix_mul mul_op = {
    .matrix    = mask,
    .ldm       = n,
    .stor_type = DLP_F32,
    .sf        = NULL
};

DLP_POST_OP_TYPE seq[] = { MATRIX_MUL };

dlp_metadata_t meta = {0};
meta.seq_length = 1;
meta.seq_vector = seq;
meta.matrix_mul = &mul_op;
// Result: C[i][j] = (A * B)[i][j] * mask[i][j]

Chaining Multiple Post-Ops

Post-ops can be chained by listing multiple types in seq_vector. They execute left to right.

Example: BIAS + RELU (common in neural networks)

float bias_values[N] = { /* ... */ };

dlp_post_op_bias bias_op = {
    .bias = bias_values, .stor_type = DLP_F32, .sf = NULL, .zp = NULL
};

dlp_post_op_eltwise relu_op = {
    .sf = NULL,
    .algo = { .alpha = NULL, .beta = NULL, .algo_type = RELU, .stor_type = DLP_F32 }
};

DLP_POST_OP_TYPE seq[] = { BIAS, ELTWISE };

dlp_metadata_t meta = {0};
meta.seq_length  = 2;
meta.seq_vector  = seq;
meta.bias        = &bias_op;
meta.eltwise     = &relu_op;
meta.num_eltwise = 1;

// Result: C = RELU(A * B + bias)

Example: SCALE + GELU_TANH + BIAS

DLP_POST_OP_TYPE seq[] = { SCALE, ELTWISE, BIAS };

dlp_metadata_t meta = {0};
meta.seq_length  = 3;
meta.seq_vector  = seq;
meta.scale       = &scale_op;
meta.eltwise     = &gelu_op;
meta.bias        = &bias_op;
meta.num_eltwise = 1;

// Result: C = (GELU(scale * (A * B))) + bias

Tips

Align buffers -- Align bias, scale, and residual matrix buffers to 64-byte boundaries for best performance.
Match data types -- Ensure stor_type of post-op parameters matches the accumulator type of your GEMM variant. For float GEMM, use DLP_F32. For integer GEMM, scale/bias still use DLP_F32 since post-ops operate on the accumulator.
Zero-initialize metadata -- Always start with dlp_metadata_t meta = {0} to avoid uninitialized fields.
Maximum post-ops -- Up to AOCL_MAX_POST_OPS (8) post-operations can be chained.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Post Ops Guide

Post-Operations Guide

The `dlp_metadata_t` Structure

Key Fields

Execution Order

Post-Op Types

BIAS -- Add a Bias Vector

ELTWISE -- Activation Functions

SCALE -- Scaling and Zero-Point

MATRIX_ADD -- Element-wise Addition

MATRIX_MUL -- Element-wise Multiplication

Chaining Multiple Post-Ops

Tips

See Also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Post Ops Guide

Post-Operations Guide

The dlp_metadata_t Structure

Key Fields

Execution Order

Post-Op Types

BIAS -- Add a Bias Vector

ELTWISE -- Activation Functions

SCALE -- Scaling and Zero-Point

MATRIX_ADD -- Element-wise Addition

MATRIX_MUL -- Element-wise Multiplication

Chaining Multiple Post-Ops

Tips

See Also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

The `dlp_metadata_t` Structure