-
Notifications
You must be signed in to change notification settings - Fork 21
Benchmark Variants
H.R. Zohouri edited this page Aug 26, 2018
·
1 revision
Each benchmark should have the following variants with different optimizations and parallelization schemes. Each benchmark should also have file README_fpga, describing some more details of the variants.
- Minimal changes to be compatible with AOCL
- Ahead-of-time compilation
- Device selection
- No struct kernel parameter
- Aggregating multiple kernel files into a single file
- restrict
- No optimization
- Originally developed for GPUs with NDRange kernels
- Unoptimized single work-item kernels with restrict and ivdep if necessary to allow correct pipelining
- NDRange
- Basic optimizations
- work-group size (
reqd_work_group_sizeormax_work_group_size) - simd lanes (
num_simd_work_items) - compute units (
num_compute_units)
- work-group size (
- Single work-item
- Basic optimizations
- Shift register for floating-point reduction
- Unrolling
- NDRange
- Advanced optimizations
- Kernel rewrite
- Local memory access reduction
- Single work-item
- Advanced optimizations
- Kernel rewrite
- Shift register as on-chip buffer
- Loop blocking/tiling
- Loop collapse
- Exit condition optimization
- Systolic array
- etc.