Divisions are much more expensive than multiplications on modern hardware. Replacing a loop-invariant division with a multiplication often results in speed boost.
Calculate the reciprocal outside of the loop and replace the division with multiplication with a reciprocal
If a hot loop involves a division with a loop-invariant constant, instead of performing the division in each iteration of the loop, one could do the following:
-
For the expression
A / B
, calculate the reciprocal of the denominator (RECIP_B = 1.0 / B
) and put it outside of the loop. -
Replace the expression
A / B
, useA * RECIP_B
.
Have a look at the following code:
float calc_div_recip(float *out, float *in, int n, float b) {
for (int i = 0; i < n; ++i) {
out[i] = in[i] / b;
}
}
In the above example, the denominator of division in[i] / b
is loop-invariant,
i.e. its value doesn't change while the loop is running. Therefore, we can
calculate the reciprocal of b
and replace the division with the multiplication
by the reciprocal:
float calc_div_recip(float *out, float *in, int n, float b) {
float recip_b = 1.0f / b;
for (int i = 0; i < n; ++i) {
out[i] = in[i] * recip_b;
}
}
Note
This optimization can result in a small change of the result compared to
original code. Many compilers support relaxed IEEE 754 floating point
optimization (e.g. -ffast-math
). If relaxed IEEE 754 mode is enabled with the
compiler, the compiler might do this optimization automatically. Implementing
it in the source code might not bring the expected speedup, but it makes the
optimization explicit in the code.