Strided array access may impact performance.
Consider using techniques like loop fusion, loop interchange, loop tiling or changing the data layout to avoid non-sequential access in hot loops.
Accessing an array using a non-unit stride is less efficient than accessing consecutive positions because the latter improves the locality of reference. Note that C/C++ column-wise matrix access is an example non-unit stride, where the stride is the column width.
The following code shows a loop with a strided access to array a
with stride
2
. Avoiding it would require changing the data layout of the program:
void example(float *a, unsigned size) {
for (unsigned i = 0; i < size; i += 2) {
a[i] = 0.0f;
}
}
Another code with strided accesses is show below. In this case, both variables
a
and b
have a stride LEN
:
for (int i = 0; i < LEN; ++i) {
for (int j = 1; j < LEN; j++) {
a[j * LEN + i] = a[j * LEN + i - 1] + b[j * LEN + i];
}
}
Note that by using loop interchange, the loop order changes from ij
to ji
.
The resulting code shown below has sequential accesses (i.e., stride 1
) for
variables ij
and b
in the scope of the innermost loop. As a result, a
simple code change solves the issue in this scenario, without requiring
disruptive changes in data layout:
for (int j = 1; j < LEN; ++j) {
for (int i = 0; i < LEN; i++) {
a[j * LEN + i] = a[j * LEN + i - 1] + b[j * LEN + i];
}
}
The following code shows a loop with a strided access to array a
with stride
2
. Avoiding it would require changing the data layout of the program:
subroutine example(a)
real, intent(out) :: a(:)
integer :: i
do i = 1, size(a, 1), 2
a(i) = 0.0
end do
end subroutine example
Another code with strided accesses is show below. In this case, both variables
a
and b
have dimensions (LEN, LEN)
, and thus are implicitly accessed with
a stride of LEN
elements as Fortran uses column-major order:
do i = 1, size(a, 1)
do j = 2, size(a, 2)
a(i, j) = a(i, j - 1) + b(i, j)
end do
end do
Note that by using loop interchange, the loop order changes from ij
to ji
.
The resulting code shown below has sequential accesses (i.e., stride 1
) in the
scope of the innermost loop. As a result, a simple code change solves the issue
in this scenario, without requiring disruptive changes in data layout:
do j = 2, size(a, 2)
do i = 1, size(a, 1)
a(i, j) = a(i, j - 1) + b(i, j)
end do
end do
-
Memory access pattern (strided access pattern)