Benchmark Variants

Jump to bottom

H.R. Zohouri edited this page Aug 26, 2018 · 1 revision

Each benchmark should have the following variants with different optimizations and parallelization schemes. Each benchmark should also have file README_fpga, describing some more details of the variants.

Version 0 (v0)

Minimal changes to be compatible with AOCL
- Ahead-of-time compilation
- Device selection
- No struct kernel parameter
- Aggregating multiple kernel files into a single file
- restrict
No optimization
Originally developed for GPUs with NDRange kernels

Version 1 (v1)

Unoptimized single work-item kernels with restrict and ivdep if necessary to allow correct pipelining

Version 2 (v2)

NDRange
Basic optimizations
- work-group size (reqd_work_group_size or max_work_group_size)
- simd lanes (num_simd_work_items)
- compute units (num_compute_units)

Version 3 (v3)

Single work-item
Basic optimizations
- Shift register for floating-point reduction
- Unrolling

Version 4 (v4)

NDRange
Advanced optimizations
- Kernel rewrite
- Local memory access reduction

Version 5 (v5)

Single work-item
Advanced optimizations
- Kernel rewrite
- Shift register as on-chip buffer
- Loop blocking/tiling
- Loop collapse
- Exit condition optimization
- Systolic array
- etc.